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Abstract 

We  introduce  an  approach  to  velocity  and  reflectivity  estimation  based 
on  optimizing  the  coherence  of  multiple  shot-gather  inversions  of  reflec¬ 
tion  seismograms.  The  resulting  algorithm  appears  to  avoid  the  severe 
convergence  difficulties  reported  for  output  (nonlinear)  least-squares  in¬ 
version.  We  describe  in  detail  an  algorithm  appropriate  for  the  layered 
acoustic  model,  using  the  convolutional  model  of  the  plane-wave  (p-tau) 
seismogram.  We  give  a  theoretical  analysis  and  numerical  evidence  that 
coherency  optimization,  as  defined  here,  yields  stable  and  reasonably  ac¬ 
curate  estimates  of  both  velocity  trend  and  reflectivity,  by  exploiting  re¬ 
flection  phase  moveout  and  amplitudes  in  a  computationally  efficient  way. 
We  also  indicate  how  the  approach  may  be  modified  to  apply  to  later¬ 
ally  heterogeneous  acoustic  models,  and  (more  briefly)  to  determination 
of  elastic  models  and  source  parameters  as  well. 


1  Introduction 

Multi-offset  model-based  inversion  is  of  interest  in  the  processing  of  wide-angle 
seismic  reflection  data,  both  for  structural  information  and  for  the  extraction 
of  reflection  characteristics  directly  indicating  the  presence  of  hydrocarbon  de¬ 
posits.  Almost  by  definition,  techniques  designed  to  extract  such  information — 
ray-trace  velocity  estimation,  amplitude-preserving  before-stack  migration — 
have  the  character  of  partial  inversion  algorithms,  and  analysis  often  reveals 
that  the  relation  is  quite  deep. 

Inversion  algorithms  based  on  the  full  seismic  waveform  have  been  discussed 
extensively  over  the  past  several  years;  see  e.g.  Tarantola  (1986),  Gauthier 
et  al.  (1986),  MacAulay  (1985),  Mora  (1986),  Kolb  et  al.  (1986),  Lines  and 
Treitel  (1984),  Bamberger  et  al.  (1982),  for  a  small  sample.  All  of  the  work 
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just  cited  is  based  on  the  output  least-squares  principle,  according  to  which  a 
subsurface  model  is  required  to  generate  (synthetic)  sections  which  simultane¬ 
ously  fit  a  corresponding  set  of  data  sections  in  the  least  mean-square  error 
sense.  This  approach  does  not  require  picked  travel  times,  unlike  reflection 
tomography  (Bube  et  al.  (1985),  Lines  et  at.  (1987)),  and  in  principle  could 
extract  an  optimal  distribution  of  seismic  velocities  as  well  as  reflection  am¬ 
plitudes,  unlike  linearized  inversion  (Ikelle  et  al.  (1988),  Cohen  and  Bleistein 
(1979),  Clayton  and  Stolt  (1981),  Beylkin  (1985)).  In  addition,  any  desired 
level  of  detail  concerning  the  physics  of  seismic  wave  propagation  may  be  built 
into  the  output-least-squares  principle,  and  it  may  also  incorporate  non-seismic 
constraints  (Lines  et  al.  1988). 

In  practice,  it  has  proven  difficult  to  realize  the  apparent  promise  of  least- 
squares  inversion  even  when  applied  to  synthetic  data  sets.  Since  the  problems 
are  computationally  large,  only  iterative  methods  are  feasible,  and  these  require 
for  their  efficient  convergence  a  measure  of  convexity  often  not  possessed  by  the 
mean-square  seismogram  error  (see  e.g.  Tarantola  (1986),  Hadjee  and  Collino 
(1988),  Santosa  and  Symes  (1986),  (1987)). 

The  cause  of  this  non-convexity  is  the  extreme  sensitivity  of  the  synthetic 
seismogram  to  changes  in  the  slowly-varying  components  of  velocity,  as  will 
be  reviewed  in  Section  2.  Thus  velocity  trends  emerge  slowly  or  not  at  all 
as  iteration  proceeds,  and  the  reflectivities  are  correspondingly  degraded.  For 
layered  medium  problems,  Kolb  and  others  have  developed  a  number  of  con¬ 
tinuation  and  reparameterization  strategies  which  render  the  optimization  more 
tractable  (Kolb,  Collino  and  Lailly  (1986),  Canadas  and  Kolb  (1986)).  Only  lim¬ 
ited  tests  of  these  devices  (and  no  theoretical  justification)  have  been  reported, 
however,  and  it  is  difficult  to  understand  how  lateral  heterogeneity  might  be 
accommodated  by  these  techniques.  Straightforward  least-squares  inversion  of 
two-dimensional  acoustic  and  elastic  models  has  appeared  to  require  a  priori  in¬ 
formation  of  the  gross  model  (velocity)  features  of  very  high  quality  and,  when 
such  information  is  provided,  the  results  closely  resemble  those  of  carefully  de¬ 
signed  amplitude-preserving  migration — unsurprisingly,  as  these  are  essentially 
equivalent  (Lailly  (1984),  Gauthier  et  al.  (1986),  Mora  (1987)).  In  fact,  the 
principal  tangible  result  of  the  work  on  least-squares  inversion  so  far  has  been 
to  provide  a  version  of  migration  which  may  conceivably  yield  estimates  of  elas¬ 
tic  reflection  amplitudes  (Tarantola  (1984),  Mora  (1987),  Burridge  and  Beylkin 
(1988)),  hence  a  rational  basis  for  amplitude  versus  offset  analysis.  While  this 
is  an  important  step,  it  appears  that  least-squares  inversion  has  not  so  far  ad¬ 
vanced  the  estimation  of  velocities  in  regions  of  complex  structure,  and  without 
such  velocity  information  its  depth-migration  function  is  disabled  as  well  (San¬ 
tosa  and  Symes  (1986),  Spratt  (1987)). 

It  is  important  to  understand  that  the  difficulty  is  not  due  to  inadequate 
modeling  of  seismic  wave  propagation,  or  to  the  features  of  field  data  (beyond 
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specifying  the  general  character  of  any  useful  model).  That  is,  least-squares 
inversion  fails  for  essentially  mathematical,  rather  than  (geo-)physical,  reasons. 
These  reasons  are  explored  in  depth  in  the  monograph,  Santosa  and  Symes 
(1986). 

The  purpose  of  this  paper  is  to  propose  another  approach  to  full-waveform 
seismic  reflection  inversion,  in  which  the  coherence  of  multiple  reflectivity  esti¬ 
mates  is  optimized.  While  the  basic  ideas  behind  this  approach  are  quite  old, 
our  use  of  them  in  designing  an  optimization  algorithm  for  inversion  seems  new. 
Moreover,  coherency  optimization  appears  to  avoid  the  mathematical  pitfalls 
which  often  prove  fatal  to  output-least-squares  inversion  while  producing  the 
same  type  of  subsurface  estimate. 

Minimal  requirements  for  a  useful  model-based  inversion  algorithm  might  be 
stated  as  follows: 

(a)  Stability:  at  least  for  data  which  are  nearly  model- consistent,  the  esti¬ 
mates  of  subsurface  properties  should  u degrade  gracefully”  in  the  pres¬ 
ence  of  data  noise; 

(b)  Computability:  for  iterative  methods,  this  means  that  convergence  should 
occur  at  a  reasonable  rate,  and  for  “poor”  initial  estimates,  i.e.,  con¬ 
vergence  should  not  require  knowing  the  answer  beforehand; 

(c)  Completeness:  the  algorithm  should  extract  “most”  of  the  information 
about  the  model  implicit  in  the  data,  and  the  user  should  be  able  to  have 
confidence  that  it  has  done  so. 

Of  course,  an  algorithm  which  meets  these  conditions  may  still  fail:  be¬ 
sides  these  mathematical  requirements,  one  must  also  impose  the  (geo-)physical 
requirement  that  the  underlying  model  faithfully  reflect  the  physics  of  the  re¬ 
flection  seismology  experiment,  at  least  at  the  level  of  detail  desired  in  the 
subsurface  estimate.  While  this  physical  consistency  is  the  ultimate  limiting 
factor  in  the  utility  of  any  inversion  algorithm,  our  focus  in  this  paper  will  be 
limited  to  the  necessary  preconditions  (a)-(c)  above. 

In  the  remainder  of  this  introduction,  we  shall  outline  the  coherency  opti¬ 
mization  algorithm  in  general  terms.  In  succeeding  sections  we  shall: 

§£:  formulate  coherency  optimization  precisely  for  the  simplest  interesting 
model:  layered,  constant- density  acoustics; 

§3:  analyse  the  algorithm  of  §2  in  some  detail,  demonstrate  that  it  possesses 
the  attributes  (a)  through  (c)  above; 

describe  a  numerical  implementation,  and  exhibit  the  results  of  some 
numerical  experiments,  which  establish  feasibility; 
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§5;  formulate  coherency  optimization  for  the  simplest  non-layered  model, 
i.e.  general  constant-density  acoustics,  to  demonstrate  that  this  ap¬ 
proach  is  not  an  artifact  of  wave  propagation  in  layered  media. 

In  the  concluding  Section  6  we  recapitulate  the  properties  of  coherency  op¬ 
timization  and  its  comparison  with  other  approaches,  and  offer  a  few  more 
observations  concerning  the  range  of  models  to  which  the  technique  might  be 
applied. 

The  paper  is  written  so  that  Sections  3  and  5  are  not  prerequisite  to  the 
others.  The  basic  assertions  of  the  paper  are  stated  verbally  in  this  introduction, 
then  with  some  mathematical  precision  in  Section  2,  and  illustrated  numerically 
in  Section  4.  Thus  the  reader  who  wishes  to  avoid  the  analytical  justification  of 
these  assertions  can  skip  Section  3  without  losing  the  thread  of  the  argument, 
and  those  interested  only  in  layered  problems  may  likewise  avoid  Section  5. 

Coherency  optimization  is  easiest  to  describe  in  the  context  of  the  lin¬ 
earized  seismogram,  in  which  the  model  is  regarded  as  the  sum  of  smooth 
background  model  and  a  highly  oscillatory  (“rough”)  perturbation  (reflectiv¬ 
ity  section).  Note  that  the  smooth  background  velocities  are  still  regarded  as 
part  of  the  model,  so  it  is  still  nonlinear.  With  some  high-frequency  approxima¬ 
tions  (“WKBJ  Seismogram”)  the  inversion  of  each  common-source  (or  common- 
receiver  or  common  offset)  gather  for  a  given  background  velocity  may  be  accom¬ 
plished  semi-analytically  (perhaps  the  best  reference  is  Beylkin  (1985)),  by  what 
amounts  to  an  amplitude-preserving  before-stack  migration.  If  the  background 
model  is  correct,  presumably  the  images  from  the  various  inverted  gathers  will 
line  up.  If  not,  the  discrepancy  indicates  that  the  background  (“migration”) 
velocity  ought  to  be  changed. 

So  far,  this  is  hardly  novel:  a  process  called  “iterative  before-stack  migra¬ 
tion”  is  described  in  just  this  way  in  Kleyn  (1979),  for  instance.  The  refine¬ 
ment  leading  to  a  feasible  algorithm  is  the  introduction  of  a  well-behaved  cost 
function,  which  we  dub  the  incoherence ,  which  both  measures  the  discrepancy 
between  the  various  common  gather  inversions  and  indicates  how  the  velocity 
fields  should  be  changed  to  line  them  up  (through  its  gradient).  A  natural  choice 
is  stack  power  or  semblance  (Fowler  (1986),  for  example),  but  this  leads  directly 
back  to  the  nonconvexity  difficulties  of  output-least-squares:  the  two  are  very 
closely  related  (Santosa  and  Symes  (1988),  Appendix  A).  Instead,  we  depend 
on  the  accurate  inversion  of  amplitudes,  and  take  the  mean-square  of  the  differ¬ 
ences  between  successive  inverted  gathers.  These  differences  should  all  vanish  at 
the  “correct”  velocity  estimate.  Most  important,  as  the  background  velocity  is 
changed,  this  incoherence  (or  “differential  semblance”)  changes  smoothly,  rather 
than  abruptly  as  is  the  case  with  stack  power  (this  is  illustrated  in  Section  4). 

Two  technical  refinements  are  necessary  to  turn  this  idea  into  an  algorithm. 
First,  data  noise  which  is  uncorrelated  from  gather  to  gather  will  yield  a  very 
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large  contribution  to  the  differences  of  the  inverted  gathers,  completely  distort¬ 
ing  the  incoherence  and  masking  the  correct  choice  of  velocity.  To  avoid  this 
oversensitivity  to  data  noise,  partly  decouple  the  inverted  gathers  from  the  data: 
lump  these  reflectivities  (one  per  gather)  together  with  the  background  velocity 
field  as  the  “model”  to  be  determined,  and  redefine  the  cost  to  be  the  sum  of: 

(a)  all  of  the  mean-square  data  errors,  i.e.  differences  between  data  seismo¬ 
gram  and  predicted  seismogram  based  on  the  corresponding  reflectivity 
section,  and 

(b)  the  mean-square  sum  of  pairwise  differences  of  reflectivity  sections  (in¬ 
coherence). 

Incoherent  noise  in  the  data  gathers  will  be  accounted  for  mostly  in  error- 
of-fit  (i.e.  (a)),  as  it  should  be,  since  it  causes  a  much  smaller  increase  in  cost 
that  way  than  when  forced  into  the  incoherence  (i.e.  (b)). 

The  second  technical  modification  is  required  by  the  nature  of  the  model-to- 
seismogram  relation:  as  pointed  out  above,  this  relation  is  extremely  sensitive 
to  the  background  velocity  component,  when  the  model  is  defined  in  terms  of 
depth  (and  offset).  This  oversensitivity  is  due  to  the  time  shift  accompanying 
a  change  in  background  velocity,  which  causes  the  temporal  location  of  a  high- 
frequency  reflection  from  a  (depth-)  fixed  reflector  to  change.  For  a  discussion 
of  this  “time-shift”  disaster  in  the  one-dimensional  context  see  Symes  (1986). 
The  remedy  is  clear:  the  reflectivity  sections  should  be  defined  as  time  sections 
rather  than  as  depth  sections. 

For  the  layered  acoustic  problem  which  occupies  most  of  this  paper,  it  is 
rather  trivial  to  accomplish  this  transformation.  It  is  particularly  convenient  to 
work  with  plane  wave  sources,  since  the  gather  corresponding  to  a  planewave 
source  at  definite  slowness  has  only  one  independent  trace.  Thus  a  single  reflec¬ 
tivity  time  trace  at  each  slowness,  together  with  the  (single)  background  velocity 
depth  profile,  specifies  the  model.  The  reflectivity  time  traces  are  converted  to 
depth  traces  through  the  (background)  vertical  travel-time-to-depth  transfor¬ 
mation  appropriate  to  each  slowness.  The  high-frequency  approximation  to  the 
(p-tau)  seismogram  is  simply  the  convolution  of  the  time-trace  reflectivities  with 
the  source  wavelet  whereas  the  incoherence  is  essentially  the  derivative  of  the 
corresponding  depth-trace  reflectivities  with  respect  to  slowness. 

For  laterally  heterogeneous  models,  the  transformation  from  time  to  depth  is 
less  obvious.  As  explained  in  Section  5,  the  analogue  of  the  travel-time-to-depth 
change  of  variables  appropriate  to  plane  waves  in  layered  media  is  in  general  a 
before-stack  Kirchhoff  migration  appropriate  to  the  acquisition  geometry  being 
simulated,  and  the  reflectivity  gather  is  simply  a  time  section.  The  incoherence 
is  a  difference  (or  derivative)  with  respect  to  shot  location  (for  shot  gathers)  of 
the  migrated  reflectivity  depth  sections. 
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Two  points  remain  to  be  emphasized.  First,  there  is  nothing  to  “cohere” 
unless  reflectors  are  present,  i.e.  unless  the  reflectivity  gathers  are  rich  in  high- 
frequency  energy.  As  is  the  case  with  semblance,  the  incoherence  is  a  measure¬ 
ment  of  the  portion  of  moveout  in  reflection  phase  not  accounted  for  by  the 
current  velocity  model.  Therefore  the  density  of  reflectors  is  a  limiting  factor 
in  the  recovery  of  accurate  velocity  models,  as  is  the  data  aperture.  These 
observations  are  quantified  in  Section  3. 

Second,  both  incoherence  and  data  misfit  are  summed  in  the  cost  functional 
for  coherency  optimization.  Thus  both  are  minimized  simultaneously.  The 
major  theoretical  result  of  the  paper  (Section  3)  is  that,  with  appropriate  weights 
on  the  two  components,  sufficiently  dense  reflectors,  and  enough  aperture  the 
cost  functional  is  smooth  and  convex  for  near-consistent  data.  The  favorable 
computational  consequences  are  exhibited  in  Section  4. 


2  Output  least-squares  vs.  coherency  optimiza¬ 
tion 


Denote  the  plane- wave  (“p-r”)  seismogram  corresponding  to  a  velocity  profile 
c(z)  by  5[c].  The  arguments  in  this  paper  are  based  on  the  well-known  convo¬ 
lutional  approximation,  which  is  reasonably  accurate  when  c  may  be  re-written 
as  a  sum 

c  + Ac 


where 


c(z)  is  a  slowly  varying  background  velocity  model 

A  c(z)  is  a  rapidly  varying  “reflector  sequence,”  having  locally  zero 
mean  on  the  length  scale  of  significant  change  in  the  background 
velocity. 

Thus  the  (two-way)  travel-time  to  depth  z  of  a  precritical  plane-wave  at  (hori¬ 
zontal)  slowness  p  is  determined  with  a  small  error  by  the  background  velocity 
c(z)  according  to 


r(;,p)  =  2_(  d/(^ 


1 

V 


where  v  is  the  vertical  (plane-wave)  velocity  at  slowness  p: 


(  \  — 1/2 

V(2>P)  =  =c(z)A(z,p) 

A (z,p)  =  (1  -  c2(z)p2)"1/2. 
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The  convolutional  approximation  with  (isotropic)  source  wavelet  f(t)  is  then 


5[c,  Ac](t,p)  =  v(0,p)  l(f*%)(t,p) 

where  the  “reflectivity”  r(t,p)  is  given  by 


(1) 


Kf.p) 


At>(C(*,p),p) 

«(C(*,p),p) 


(2) 


by  means  of  the  inverse  two-way  travel-time  function  £,  defined  implicitly  by 


t  = 


2 


Jo  v’ 


and  the  vertical  velocity  perturbation 


Av  —  Ac  ■  A3. 


Note  that  reflectivity  conventionally  means  dr/dt ;  we  shall  confuse  r  and  its 
t-derivative,  calling  both  “reflectivity”  as  convenient. 

Now  S  is  clearly  linear  in  Ac,  but  quite  nonlinear  in  c.  In  fact,  a  change 
in  c  will  typically  result  in  a  change  in  the  “phase”  £,  and  thus  in  a  shift 
in  the  high-frequency  components  of  S,  which  in  turn  derive  from  the  high- 
frequency  components  in  Ac.  Since  such  a  phase-shift  may  have  a  drastic  effect 
on  components  of  the  appropriate  (high)  frequency,  and  since  Ac  must  have  a 
great  deal  of  high-frequency  content  in  order  to  model  the  dense  distribution 
of  reflectors  in  the  typical  sedimentary  column,  one  expects  S  to  be  extremely 
sensitive  to  changes  in  the  background  velocity  c. 

This  oversensitivity  to  background  errors  shows  quite  clearly  in  the  expres¬ 
sion  derived  for  the  perturbation  6S  in  the  seismogram,  due  to  a  perturbation 
6c  in  the  background  velocity  (holding  Ac  fixed,  and  assuming  6c(0)  =  0): 

6S(t,p)  =  v(0 ,p)-7*  ^  (-^v+  ^V)5C) 


where 


6((z,p)  :=  6((r(z,p),p)  = 

v(z,p)  /  dz'v(z' ,p)c-3(z')6c(z') 

Jo 

is  the  phase  perturbation  corresponding  to  Ac,  referred  to  depth/slowness  co¬ 
ordinates. 
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Thus  the  2-derivative  of  Ac  (or,  of  r)  appears  in  the  perturbations  6S  as¬ 
sociated  with  a  background  velocity  perturbation  Sc.  On  the  other  hand,  a 
perturbation  6 Ac  in  Ac  simply  results  in 

6S  ~  S[c,  6 Ac] 

as  S  is  linear  in  Ac  —  thus  no  derivative  of  Ac  is  involved. 

Again,  Ac  must  be  highly  oscillatory  to  model  the  typical  reflector  distribu¬ 
tion,  so  the  derivative  of  Ac  is  typically  much  bigger,  in  any  reasonable  sense, 
than  is  Ac  itself.  Thus  perturbation  of  the  background  velocity  c  has  a  much 
larger  effect  on  S  than  does  perturbation  of  Ac:  in  the  language  of  linear  algebra, 
the  linear  perturbation  map  (“Jacobian”) 

Sc,  SAc  t-*  SS 


is  ill-conditioned. 

Even  worse,  a  straighforward  extension  of  this  reasoning  shows  that  the 
difference  between  the  perturbed  section 

S[c  +  Sc,  Ac  +  6  Ac] 

and  its  linear  prediction 

S[c,  Ac]  +  SS 

can  easily  be  on  the  order  of  S  itself,  even  for  quite  small  Sc. 

To  summarize:  under  realistic  assumptions  on  c  and  Ac, 

(i)  the  Jacobian  SS  is  ill-conditioned; 

(ii)  S  is  poorly  approximated  by  its  linearization. 

Consequently,  the  output  least-squares  objective  function 

J[c,  Ac- Sdata]  :=  J  J  dpdt\S[c,Ac\-  Sdata\2 

is  highly  non-convex,  with  rapidly  changing  gradient,  and  the  optimization  prob¬ 
lem 

mm  J[c,Ac]  Sdata}  (3) 

is  extremely  difficult  to  solve  by  means  of  any  variant  of  Newton’s  method.  For 
extensive  discussion  and  illustration  of  these  points,  consult  Santosa  and  Symes 
[1986], 

The  crux  of  the  difficulty  is  the  interaction  between  the  background  c  and  the 
reflectivity  r  through  the  definition  (2).  Accordingly,  it  is  tempting  to  decouple 
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c  and  r  by  treating  r,  rather  than  Ac,  as  the  “other”  component  of  the  model: 
thus  define 

5[c,r](t,p)  =  u(0,p)-1/*  |^(t,p). 

Certainly  if  r  and  Ac  are  related  by  (2)  then 

S[c,  Ac]  =  S[c,  r]. 


In  fact,  apart  from  the  surface  normalization,  the  background  velocity  c  enters 
the  definition  of  S  only  implicitly,  through  the  condition  (2).  If  we  are  to  use  r 
as  one  of  the  independent  variables,  instead  of  Ac,  we  must  develop  a  condition, 
phrased  only  in  terms  of  c  and  r,  which  guarantees  that  (2)  holds.  Fortunately, 
this  is  rather  easy:  from  the  useful  identity 


At>  _  Ac 
v 3  ~  c3 

it  follows  that,  if  (2)  is  satisfied,  then 


c  3(z)Ac(z)  =  t/  2(z,p)r(r(z,p),p). 


(4) 

(5) 


The  notable  quality  of  (5)  is  that  the  left-hand  side  is  independent  of  p.  Thus 
differentiation  with  respect  to  p  eliminates  Ac  altogether: 

°=  ^[^_2(2,p)Kr(r,p),p)].  (6) 

It  is  easy  to  reverse  this  reasoning.  Thus: 


A  section  r(t,p)  is  the  reflectivity  corresponding  to  Ac(z)  if  and 
only  if  (6)  is  satisfied,  in  which  case  Ac  is  given  in  terms  of  r(t,p) 
and  c(z)  by  (5). 


In  formulating  the  constraint  given  by  (6),  it  is  advantageous  to  return  to  (t,p) 
coordinates  (the  reason  will  become  apparent  below).  Thus  define  the  quantity 
C[c,  r]  (the  incoherency)  by 


C[c,r]{t,p)  = 


[v  2(z,p)r(r(z, 


*=<(«, p) 


Then 


if  and  only  if 


S[c,  Ac]  =  Sdata 


5[c,r]  =  Sdaia 
and  C[c,  r]  =  0 
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with  Ac  and  r  related  by  (2)  or  (5). 

Since  the  problem  is  clearly  overdetermined,  it  is  commonplace  to  replace 
the  exact  fit  of  predicted  to  measured  seismograms  by  a  best-fit  condition.  We 
shall  folow  the  lead  of  most  authors  on  this  subject  (e.g.  Tarantola  and  Val- 
lette  (1982))  and  use  the  mean-square  error  measure.  Some  notation  for  this  is 
desirable:  for  a  single  z-  or  /-trace,  e.g.  /(/),  we  write 

\ 1/2 

l/(0l2J 

whereas  for  a  (t,p)  or  ( z,p )  section,  e.g.  F(t,p),  we  write 

atm&x  /*Pm»x  x  1/2 

dt  J '  dpp\F(t,p)\2  J 

The  weight  p  in  the  integral  defining  |||  |||  is  a  residue  of  polar  coordinates,  and 
is  used  to  make  the  norm  |||  j||  as  close  as  possible  to  equivalent  to  the  ordinary 
mean-square  of  an  x-t  section.  See  Santosa  and  Symes  (1988),  Appendix  B;  also 
Brysk  (1986). 

In  view  of  the  equivalence  noted  above,  it  is  natural  to  pose  the  constrained 
least-squares  problem: 


minimize  |||S[c,  r]  -  Srfa<J||2 
c,  r 

subject  to  C[c,  r]  =  0. 


(7a) 


This  problem  is  closely  related  to  the  output  least-squares  problem  (3).  In 
fact,  the  relation  is  too  close.  The  two  problems  have  the  same  solution,  and  in 
fact  are  mathematically  quite  similar  as  well,  so  that  (7a)  is  also  poorly  suited 
to  computation.  To  obtain  a  problem  which  is  not  too  “stiff”  we  “relax”  (7a) 
by  making  the  constraint  C  =  0  soft: 

minimize  J[c,r,Sdai  J  (7b) 


where 

j[c,r,Sdata }  =  |||S[c,r]  -  Sdata\\\2+cr2\\\C[c,r}\\\2. 

We  shall  call  (7b)  the  coherency  optimization  problem. 

From  the  equivalence  above,  we  see  that  if  Sda^a  is  consistent,  i.e.  Sdaia  — 
S[c,  Ac],  then  the  problem  (7b)  has  [c,  r]  amongst  its  solutions  for  which  r  and 
Ac  are  related  by  (5).  That  is,  for  consistent  data,  (7b)  has  “the  same  solutions” 
as  the  output  least-squares  problem  (3).  We  shall  show,  however,  that  (7b)  is  far 
better  suited  to  numerical  computation  by  Newton’s  method  and  its  relatives. 
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We  shall  also  show  that  the  solution  of  (7b)  is  stable,  i.e.  “degrades  gracefully” 
in  the  presence  of  data  error,  for  reasonable  choices  of  the  penalty  parameter  <r. 

These  conclusions  are  true  provided  that  S^a  is  near  (in  the  mean-square 
sense)  some  “exact”  or  consistent  data  S[c,  Ac],  and  provided  that  c,  Ac  satisfy 
certain  conditions.  “Physical,”  or  poetic,  statements  of  these  conditions  are: 

(i)  Ac  should  be  “rough”,  i.e.  contain  significant  variation  (reflectors), 
on  a  length  scale  dictated  both  by  the  wavelet  (/)  passband  and  by  the 
smoothness  (characteristic  length)  of  c; 

(ii)  The  range  of  slownesses  p  available  in  the  data  (“aperture”)  should  be 
sufficiently  large  relative  to  both  the  degree  of  roughness  mentioned  in 
(i)  and  the  amount  of  data  error,  so  that  the  moveout  of  reflections 
clearly  discriminates  the  velocities. 

It  is  also  necessary  that  the  penalty  parameter  <x  be  chosen  appropriately. 
“Appropriate”  means,  so  that  the  desired  properties  are  obtained,  i.e.,  so  that 
Newton’s  method  works  (from  a  poor  initial  guess)  and  so  that  the  solution 
obtained  is  relatively  insensitive  to  data  error.  The  main  theoretical  result  of 
this  paper  is  that  such  an  appropriate  choice  of  <x  is  possible,  provided  that  (i) 
and  (ii)  are  satisfied  in  a  suitable  sense,  and  that  this  choice  is  robust. 

Note  that  in  the  limit  <x  — *  oo,  (7b)  becomes  (7a),  i.e.  the  constraint  becomes 
hard.  Previous  remarks  have  indicated  that  (7a)  (or  (3))  is  not  susceptible  to 
numerical  solution  through  Newton’s  method  or  its  relatives.  Thus  the  possi¬ 
bility  of  choosing  a  “not  too  large”  is  crucial  to  the  construction  of  algorithms 
which  converge  from  a  poor  initial  guess. 

In  the  following  section  we  will  carefully  quantify  (i)  and  (ii)  and  justify  these 
conclusions  through  an  analysis  of  (7b),  and  in  section  4  we  will  present  the  re¬ 
sults  of  some  numerical  experiments,  along  with  a  discussion  of  implementation 
issues  (e.g.  choice  of  er). 


3  Analysis  of  the  coherency  optimization  prob¬ 
lem 

The  local  analysis  of  optimization  problems  amounts  to  the  verification  of  several 
conditions  concerning  the  first  and  second  derivatives  of  the  objective  function. 
These  conditions  are  imposed  near  a  particular  solution,  and  guarantee  that 
this  solution  is  stable  under  data  perturbations.  The  same  conditions  ensure 
rapid  local  convergence  of  Newton’s  method.  We  state  the  verbal  description  of 
these  conditions  here,  together  with  the  interpretation  in  terms  of  the  quantities 
introduced  in  the  last  section: 
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(i)  ( Regularity )  The  objective  function  should  be  twice  differentiable  near 
the  solution. 

This  is  completely  obvious  for  the  seismogram  error,  as  it  is  independent  of 
c  and  quadratic  in  r.  For  the  incoherency,  the  regularity  is  obvious  from  the 
identity 

C[c,r](t,p)  =  ^  (u(z,p)-2r(r(2,p),p))[ 

=  -2 pr(t,p)  +  u-2«(i,p),p)  {§£(<, p)  -  Pjj(t,p)  /0C(',P)  i)} 

(8) 

Clearly  varying  c  will  have  the  effect  of  differentiating  v  —  but  c,  hence  v,  is 
presumed  sufficiently  smooth  that  its  derivative  is  not  significantly  larger  than 
c  itself.  On  the  other  hand,  no  additional  derivatives  of  r  (the  locus  of  high- 
frequency  energy)  result  from  varying  c.  Thus  C  may  be  regarded  as  regular. 
Note  that  the  perturbation  of  C  would  involve  further  derivatives  of  r  if  we  had 
left  off  the  final  referral  back  to  time/slowness  coordinates. 

(it)  (second-order  sufficiency):  The  objective  Hessian  is  positive-definite. 

Most  of  this  section  is  devoted  to  verifying  this  last  condition;  it  is  the  source 
of  the  requirements  mentioned  at  the  end  of  the  last  section. 

The  Hessian  is  the  second-order  coefficient  62J  in  the  power  series  expansion 
J[c  +  eSc,  r  +  eSr,  Sdata] 

„  e 2  - 

=  J[c,  r,  Sdaia]  +  eSJ  +  y<52  J  +  •  •  ■ 

In  view  of  the  goal,  enunciated  at  the  end  of  the  last  section,  to  examine 
the  perturbation  of  a  consistent  solution  due  to  perturbing  the  data,  we  shall 
assume  that 

J'lc >  r<  Sdata\  =  °- 

That  is,  that 

S[c,  r]  =  Sdata  and  C[c,  r]  =  0.  (9) 

Let  us  state  very  carefully  the  meaning  of  the  condition  (ii)  in  terms  of 
62J:  i.e.  62J  is  positive.  Condition  (ii)  actually  requires  that  62J  be  “positive 
relative  to”  6c,  Sr:  for  some  L  >  0, 

I||62j|||>l(M2  +  IIHII2)-  (io) 

Indeed,  as  follows  from  the  proof  of  the  implicit  function  theorem,  as  L  — *•  0  in 
(10),  the  size  of  the  region  of  convexity  (within  constrained  models)  of  J  goes  to 
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zero,  and  the  possible  ratio  of  solution  error  to  data  error  goes  to  infinity.  Thus 
we  need  not  only  L  >  0,  but  some  positive  control  over  L,  to  ensure  stability 
(and  computability)  of  the  solution. 

An  easy  calculation  gives 

52J  =  2|||/+^r|||2  +  2<T2|pC|||2. 

Thus  as  a  first  requirement,  we  must  have  (for  each  p) 

||/*  JUr||>  (11) 

for  some  constant  K\  >  0.  Unless  K\  is  to  be  uselessly  tiny,  this  means  in 
effect  that  each  trace  Sr  (p  =  const.)  (hence  eventually  r )  must  have  most  of  its 
energy  in  the  passband  of  the  source  /,  or  else  that  the  out-of-band  components 
must  be  constrained  a  priori.  We  adopt  the  first  option:  i.e. ,  that  we  shall 
estimate  only  passband  reflectivities.  In  quantitative  terms,  K\  should  be  at 
least  some  substantial  fraction,  say  one-half,  of  the  absolute  moment  of  the 
wavelet  (this  choice  is  justified  by  the  theory  of  convolution  operators,  and  is 
essentially  optimal): 

Kiv.bJ \f\. 

Inequality  (11)  only  shows  that  62J  majorizes  IH^HI2.  Reference  to  (10) 
shows  that  this  is  inadequate — unsurprisingly,  since  we  have  not  made  full  use 
of  the  hypotheses  of  (ii).  To  go  further,  we  must  introducd  the  linearized  con¬ 
straints  (since  otherwise  Sc  is  unconstrained).  As  we  have  assumed  c,  r,  S^a 
consistent,  i.e.  the  system  of  equations  (9),  this  simplifies  considerably.  The 
calculation  is  displayed  in  the  Appendix:  with  Ac  given  by  (5),  we  obtain  the 
identities 


SC 

=  6cC  +  6rC 

(12a) 

ScC 

=  — 2p{2  [(Q<5c)Ac- 

i  +  (/;q5c)p2av](^) 

+  [(Jo  Q^)c 

i 

j> 

?i» 

5 

N 

II 

A 

(12b) 

6rC 

=  C[c,  Sr] 

=  {i('rW)} 

°c 

(12c) 

Here  the  quantity  Q6c  is  defined  by 

QSc  :  =  [<5(v  o()]ot 

=  A3(Sc  +  c'vf0z  v£). 
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Also,  we  have  used  the  useful  notation 


gor  (z,p)  =  g{r(z,p),p) 

for  (any)  function  g(t,p)  and  similarly  for  the  symbol  o£. 

The  further  analysis  of  this  condition  in  general  is  a  little  involved.  We 
consider  in  detail  only  the  special  case 


c  (hence  v)  constant  for  z  >  zq 
Sc  (hence  6v)  =  0  for  z  <  zq. 


(13) 


A  few  remarks  about  the  general  case  appear  near  the  end  of  this  section.  See 
Figure  1. 

Under  this  restriction,  6cC  simplifies  considerably:  QSc  =  A3<5c,  and 
6cC  o  t  =  — 2pA4c-2  I" 2ScAc  + 


Now 


dp 


A2  =  2pc4A4. 


2*4 


Thus  what  remains  of  (12a)  is  an  exact  derivative  in  p,  so  we  integrate  in  p  from 
pi  to  p2  to  obtain 


C-4{25cAc+(/;6c)(Ms)}A2|^  =  f-2(*r  o  r)f|  +  £  dp{6fC  or) 

=  c_2A_2(<5r  o  r)|p’  +  dp{6sC  o  r). 

/  (14) 

Equation  (14)  will  be  the  “hook”  to  get  62J  into  an  explicit  relation  with 
Sc.  This  is  accomplished  in  three  steps.  First,  we  make  use  of  an  important 
by-product  of  the  assumed  bandlimited  feature  of  f  : 


Since  f,  hence  r,  Sr,  has  its  energy  concentrated  away  from  0 
Hertz,  the  t-derivative  of  Sr  is  at  least  as  big  (probably  much 
larger j  than  Sr  itself:  for  each  p, 


I<2\\Srf  < 


dSr 

~dt 


2 


f° 


r  a  constant  K 2  depending  on  the  lowest  passband  frequency. 


Together  with  (11),  this  gives 

K2Kl\\\Sr\\\2  <62J. 


(15) 
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The  second  step  is  to  manipulate  (14)  into  an  inequality  with  |||£r|||2  on  the 
right-hand  side.  This  is  accomplished  with  the  aid  of  several  identities  relating 
time  and  depth,  which  are  massively  simplified  by  assuming  that  c  =  const. 
where  6c  ^  0,  as  we  have  done. 

According  to  the  hypothesis  (13),  for  z  >  z0 

c(z)  =  ci  for  z  >  zq 
r(z.p)  =  r„(p)  +  fcg} 

where 

n>  (p)  =  r(z0,p) 

and 

Ai(p)  =  (1  -clp2)~l/2. 

Thus  (since  (14)  implies  6r  o  r  =  0,  2  <  2o): 

||5ror||2  =  /d2|<5r(r(2,p),p)|2 

=  |ciAi(p)/d<|i5r(<,p)|2  (16) 

=  |c1Ai(p)|j<5r||2. 

Now  set  pi  =  p,  pi  —  |pmax  +  P,  and  integrate  yjp  times  the  first  summand 
on  the  r.h.s.  of  (14)  from  p  =  0  to  p  =  |pmax  (for  fixed  z)  neglecting  the  factor 
of  c-2: 

±  /0"Pm*^PVp(Ar2^°^)|p+"Pm*x 

<  /o*'“  » x-u} 

<  fgmA‘  dP\/p  Af2|5r  o  rj 

<  (Jo  dPK3) "  (/oPra*'  dp  p\^\6r  o  r|2)  *  . 


Applying  the  same  step  to  the  absolute  value  of  the  l.h.s.  of  (14)  gives 


c74  |25cAc+  Qf  (^)}A(ci,pmax) 


(18) 


where 


*^(^1  j  Pm  ax 


3P  n 


dpy/p  ^A2(p  +  ipmax)  -  A2 


The  modulus  A(ci,pmax)  satisfies  A(0)  =  0  and  A(ci,pmax)  -^ooas  pmax  — 1 •  1/ci 
(since  the  integrand  has  a  nonintegrable  singularity  at  pmax  =  1/ci).  Otherwise 
A  is  best  investigated  numerically;  a  plot  of  A  vs.  ci,pmax  appears  as  Figure  2. 
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Clearly,  in  order  that  A  >>  0  it  is  necessary  that  cipmax  be  relatively  close  to 
1. 

Since  (17)  is  valid  with  either  sign,  we  can  square  both  sides  of  the  inequality 
we  have  just  derived  from  (14),  (17),  (18)  and  make  an  obvious  estimate  of  the 
second  term  on  the  r.h.s.  of  (14)  to  get: 

{2tfcAc+ (/>)(*££)} 

<  A (cllPmax)-2c2  [(/0P““  dpAj-3)  /p““  dppA^\5roT\2 
+  \clpl^JZ-dpp\6CoTf}, 
integrate  both  sides  in  z ,  and  make  use  of  (16): 

]2«cAc+0»(%i)|’ 

<  A1(ci,iw)IIM|I,+  A2(c1,pm»,)IIIW||! 

where  Ai  is  given  by 

Al(ci,pmax)  =  2  C1  A(cl  i  Pmax)  J  Aj 

and 

^  /*Pmax 

^2(C1  ?  Pmax)  =  —  CjA(ci ,  Pmax)  J 

A  plot  of  Aj.  appears  as  Figure  3.  Evidently: 

In  order  that  A^  be  small,  it  is  necessary  that  Cipmax  relatively 
large  (near  1 ). 

This  is  still  not  quite  what  is  required,  as  we  have  an  estimate  on  the  left-hand 
side  of  (18),  not  on  6c  itself.  The  third  step  amounts  to  the  observation  that,  if 
6c  is  smooth  enough,  and  Ac  rough  enough,  then  this  quantity  cannot  be  small 
without  6c  itself  being  small. 

This  observation  is  the  most  physically-meaningful,  and  the  most  difficult 
to  quantify,  of  the  points  presented  in  this  paper.  As  illustrated  thoroughly  in 
Santosa  and  Symes  (1986),  the  local  behaviour  of  the  seismogram  function  near 
a  rough  model  is  quite  different  from  that  near  a  smooth  model,  and  it  is  this 
property  that  is  being  used  implicitly  at  this  point. 

A  rigorous  semiquantitative  treatment  of  roughness  is  found  in  Symes  (1988), 
and  the  present  problem  can  be  treated  along  the  same  lines.  Such  rigorous 
argument  is  neither  particularly  enlightening,  however,  nor  very  precise.  For  the 
present,  a  numerical  illustration  of  this  property  will  suffice.  The  smoothness 
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of  the  velocity  model  c  and  its  perturbation  are  guaranteed  for  example  by 
insisting  that 

c(z)  =  c0exp{£^=1x;^;(z)} 

Sc(z)  =  E/=i  6xj'Pj(z) '  c(z) 

where  {ipj}  is  a  finite,  smooth  set  of  basis  functions.  In  the  tests  reported  in 
Section  4,  cubic  6-spline  integrals  are  selected  for  the  {rpj}:  these  have  associ¬ 
ated  with  them  a  definite  length  scale  (“width”);  see  Figure  4.  Then  in  vector 
notation, 

||6cAc||2  =  SxT  R°6x 

where 

R-if  =  J  dz  ipi(z)ipj(z)c2(z)Ac2(z) 

is  the  N  x  N  symmetric  positive-semidefinite  “roughness  matrix”  associated 
with  Ac  and  the  basis  {t6j}. 

Similarly 

||6c||2  =  SxT  MSx 

and 

(I Sc)  H =  “TRmsx 

where 

=  *•)  il  *')c’w(irw) 

and  M  is  the  symmetric  positive-definite  “mass  matrix” 

Mi,  =  J  dz  ipi(z)ipj(z)c2(z). 

It  follows  that 

M(mlxll^l|2  >  ||6cAc||2  (20a) 

where  fimlx  >  0  is  the  largest  eigenvalue  of  the  ^-dimensional  generalized  eigen¬ 
value  problem 

R(0)y  =  nMy.  (20b) 

Likewise, 

u'sc)d-ti  <2oc> 

where  /ij^n  is  the  least  eigenvalue  of  the  generalized  eigenvalue  problem 

R^y  =  pMy.  (20d) 
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Finally, 


PminINI2  <  ||2<5cAc  +  (jf  Szj  ^||2  (20e) 

where  p  is  the  least  eigenvalue  of  the  generalized  eigenvalue  problem 

Ry  :=  (2 +  R(l))y  =  y.My.  (20f) 

Combining  (19),  (20a),  (20c)  and  (20d)  we  get 

Pmin||^c||2  <  A1(ci,pmax)|j|6r|||2  +  A2(ci  ,pmax)|||<$C|||2  (21a) 

and 

Pm,n  >  (tiL  -  Vm°lx)-  (21b) 

Obviously  inequality  (21b)  has  force  only  if  >>  /imL,  that  is,  9A c/dz  is 
much  bigger  than  Ac,  whence  Ac  must  be  rough.  In  fact,  Ac  must  be  uniformly 
rough  on  the  length  scale  of  significant  variation  in  c.  To  see  this,  note  that 
the  worst  possible  situation  is  when  f?(°)  and  Ri1)  have  a  common  null  vector, 
since  then  the  left-hand  side  of  (19)  constrains  the  corresponding  component 
not  at  all.  This  can  indeed  occur.  Note  that  the  integrated  spline  basis  {ipj}  is 
so  constructed  that  ipj  —  tpj+i  vanishes  outside  an  inteval  Ij,  encompassing  five 
spline  nodes:  see  Figure  5.  Suppose  that  for  some  j,  Ac  =  0  in  the  interval  Ij : 
that  is,  there  are  no  reflectors  in  Ij.  Set  6xj  —  1,  6xj+i  =  -1,  6xi  =  0  for  i  ^  j. 
Then  Rfj) 6x  =  0,  j  —  0,  1  so  p  =  0  with  eigenvector  8x  for  both  matrices. 
Thus: 


In  order  that  (i(^n  >  >  0,  significant  reflectors  must  be 

present  in  every  depth  interval  of  the  characteristic  length  scale 
of  the  smooth  model  class. 

In  fact,  it  turns  out  that  this  condition  is  also  necessary  in  order  that  the 
“combined”  eigenvalue  fimin  be  reasonably  large,  as  we  shall  see  below. 

Note  also  the  connection  with  the  wavelet  passband.  In  order  for  the  convo¬ 
lutional  model  to  be  accurate,  c  must  have  almost  all  of  its  energy  concentrated 
below  the  passband  (or  rather  its  spatial  analogue):  that  is,  c  must  be  smooth 
on  the  spatial  wavelength  scale.  Suppose  that  c  is  chosen  to  attain,  roughly,  the 
maximum  degrees  of  freedom  permitted  by  this  constraint.  Thus  the  length  scale 
associated  with  the  background  model  is  roughly  the  largest  spatial  wavelength 
in  the  data,  and  we  can  re-phrase  the  above  conclusion  as: 

In  order  that  (19)  above  constrain  6c,  significant  reflectors  must 
be  present  in  every  depth  interval  longer  than  the  longest  spatial 
wavelength  in  the  data. 
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This  necessary  condition  is  actually  also  sufficient,  when  made  appropriately 
precise  (Symes  1988;  see  also  Santosa  and  Symes  1986).  We  illustrate  the  suf¬ 
ficiency  by  introducing  the  reflector  sequence  Ac  figuring  in  the  experiments 
of  Section  4.  See  Figure  6.  We  have  computed  the  extreme  eigenvalues  of  the 
problems,  (20b),  (20d),  (20e).  For  JV-dimensional  integrated  spline  space,  the 
characteristic  length  scale  (i.e.  the  length  of  Ij)  is  proportional  to  l/N.  In 

Figure  7,  we  tabulate  /imax,  /^min>  an^  ^min  against  N,  along  with  the  charac¬ 
teristic  length  scale;  for  the  background  velocity  profile  we  have  taken  the  profile 
in  Figure  6  also.  We  have  not  restricted  the  non-zeros  of  6c  to  the  c  =  const. 
segment,  so  this  is  a  more  severe  test  than  warranted  by  the  present  discussion. 
Evidently,  the  criterion  above  is  indeed  sufficient  to  guarantee  fj.^n  —  4/imax  >  0 
in  this  case  and  appears  necessary  to  have  fimm  »  0. 

We  now  combine  the  conclusions  of  three  steps:  (13),  (19),  (20e),  to  get 


—  II^H2  <  62J, 


7  = 


Aj  A2 
K2K2  +  <r2 


and,  using  (15)  again 


min  {^K2Kl  (||*c|j2  4-  Prill2)  <  S2J  (22) 

which  gives  an  explicit  estimate  for  the  constant  L  in  (10),  as  required.  Note  that 
this  constant  of  proportionality  is  neatly  separated  into  ^m;n,  which  depends 
only  on  the  roughness  of  Ac,  relative  to  6c,  and  7,  which  depends  on  aperture 
and  passband  amplitude,  and  on  the  penalty  parameter  cr. 

Note  the  role  of  cr  in  (22):  it  is  only  the  quotient  A2/<72  which  influences  the 
lower  bound.  The  aperture-dependent  Ai  and  A2  are  of  roughly  the  same  size, 
so  the  requirement  that  the  lower  bound  be  reasonable  imposes  a  (rather  loose) 
relation  between  the  aperture,  the  passband  amplitudes  (K\,K2),  and  cr:  in 
effect,  a  (soft)  floor  is  placed  under  cr.  On  the  other  hand,  the  upper  limit  of  the 
spectrum  of  62J  clearly  grows  with  cr2.  Thus  for  “moderate”  values  of  a,  the  62  J 
is  as  well-conditioned  as  possible.  Several  possibilities  exist  for  pinning  down 
this  range:  guessing  (the  option  pursued  in  numerical  work  reported  here), 
rigorous  theoretical  estimation  (far  too  conservative),  or  adaptive  estimation 
during  iterative  solution  of  the  linear  stage  of  Newton’s  method  (a  project  for 
the  future). 

Recall  that  the  argument  leading  to  this  conclusion  was  based  on  the  restric¬ 
tion  (13).  Without  this  restriction,  the  argument  becomes  more  involved,  but 
the  conclusion  is  qualitatively  the  same:  for  sufficiently  rough  Ac,  an  inequality 
like  (22)  holds.  It  is  evident  from  the  form  of  the  expression  Q6c  that  c'  ^  0 
will  degrade  the  contribution  of  large— p  traces  to  the  lower  bound,  and  this  is 
indeed  the  main  quantitative  effect.  A  rigorous  argument  of  this  nature,  in  the 
context  of  the  fully  nonlinear  problem,  can  be  found  in  Symes  (1988). 
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4  Numerical  Experiments 


In  this  section  we  report  an  implementation  of  the  algorithm  suggested  in  Sec¬ 
tion  2,  and  the  results  of  some  numerical  experiments  which  establish  its  feasi¬ 
bility.  More  extensive  tests  of  the  method  will  be  reported  elsewhere. 

The  implementation  requires  (1)  a  choice  of  discretization  of  the  parameter 
space  and  operators;  (2)  a  choice  of  optimization  strategy.  We  give  sufficient 
detail  in  both  areas  that  the  interested  reader  should  be  able  to  reproduce  our 
results. 


4.1  Discretization 

Both  to  maintain  applicability  of  the  convolutional  model,  and  for  the  more 
subtle  reasons  given  in  Section  3,  it  is  necessary  that  the  background  velocity  c 
be  quite  smooth — much  smoother  than  the  reflectivity,  for  example.  A  simple 
explicit  way  (though  certainly  not  the  only  way)  to  ensure  a  given  degree  of 
smoothness  is  to  choose  c  from  a  finite-dimensional  function  space  spanned  by 
smooth  functions. 

For  the  space  velocity  profiles  we  took  a  manifold  of  exponentiated  integrated 
6-splines  (see  Figure  4): 


c(z)  =  Co  exp 


nmod 

dz'  xi'Pi{z') 


i=  1 


where  ipi(z)  =  ynspl-  ( '  ZspV  )  )  ’  ^  a  s*'an^arc^  6-spline,  and  Zi  =  zspl, 
i  =  0, . . . ,  nspl  are  nspl  +  1  evenly  spaced  nodes.  We  set  nmod  =  nspl  —  3,  so 
that  all  summands  vanish  at  z  =  0  and  z  =  zspl.  See  Figure  4. 

The  surface  velocity  cq  was  regarded  as  a  fixed  parameter. 

The  perturbations  6c  have  the  form 


6c(z)  = 


c{z). 


When  values  z  >  zspl  are  needed,  both  c  and  6c  are  regarded  as  constant  and 
equal  to  their  values  at  zspl. 

The  depth  functions  occurring  in  the  various  formulae  are  sampled  on  a  fixed 
grid  {j  dz  :  0  <  j  <  nz},  where  nz  ■  dz  =:  zmax  >  zspl  (“the  z-grid”).  Routines 
were  written  which  convert  sampled  z-grid  functions  (c,  6c)  into  spline  coeffi¬ 
cients  ( x,6x )  and  vice  versa,  and  which  satisfy  certain  adjointness  conditions 
detailed  below. 
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Time-slowness  sections  are  regarded  from  the  outset  as  sampled  on  a  fixed 
grid  of  size  ( nt  +  1)  x  np,  at  given  sample  intervals  dt  and  dp.  We  used  one-way 
time  as  the  time  parameter  throughout. 


4.2  Operators 


The  formulas  for  the  incoherency  C  and  its  perturbation  8C  from  the  appendix 
and  Section  2  require  the  travel-time  change-of-variable  and  its  inverse.  These 
were  accomplished  via  interpolation.  For  example,  the  integral 


dz' 

v(z',p) 


is  approximated  using  the  trapezoidal  rule  on  the  z-grid,  yielding  an  unequally 
spaced  set  {tv}  of  travel-times.  The  time-function  to  be  converted  to  a  depth 
function  must  then  be  evaluated  at  the  r,-,  which  is  done  via  piecewise-linear 
interpolation.  The  total  process  has  a  truncation  error  on  the  order  of  dz2. 

Derivatives  with  respect  to  z,  t  and  p  are  replaced  by  simple  3-point  centered 
difference  formulas,  maintaining  the  truncation  order.  Discretization  of  C  and 
6C  was  accomplished  by  means  of  the  formulas  indicated  in  the  Appendix. 

The  synthetic  p-tau  seismogram  5[c,  r]  was  computed  using  a  centered  3- 
point  <-difference  and  trapezoidal  rule  approximation  of  the  convolution  integral. 


4.3  Norms 


The  definition  of  the  cost  function  J  involves  the  L2-norm.  As  noted  in  Section 
2,  this  should  really  include  a  factor  of  p,  to  most  closely  maintain  the  relation 
with  mean-square  error  in  (z,<)  domain.  To  simplify  our  calculations,  however, 
we  ignored  this  point  and  defined  the  section  L2  norms  by 


nt  np 

lllFlll2  =  ^Z^2dtdPWtU)wp(k)\Fik\2 


j= 0 k= 1 


where 


w,(j)  = 
wp(k)  = 


1/2  if  j  =  0,  nt 
1  else 

1/2  if  k  =  1,  np 
1  else 


i.e.  the  trapezoidal  rule.  With  this  choice,  J  is  completely  defined. 

The  importance  of  choice  of  norms  in  the  model  space  [c,  r]  cannot  be  overem¬ 
phasized.  Even  though  the  definition  of  J  is  independent  of  this  choice,  it  is 
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the  main  factor  affecting  the  efficiency  of  the  optimization.  This  is  principally 
because  of  the  role  of  the  model  norm  in  the  definition  of  the  gradient.  We 
examine  this  point  for  the  summand  <r2|||C[c,  r] 1 1 1 2  of  J,  since  it  involves  both 
c  and  r.  By  definition,  the  gradient  of  this  term  is  the  (unique)  model  vector 
[c,  f]  for  which  for  any  (Sc,  Sr), 


lims_0  i  (<72|||C[c,  +eSc,  r  +  e6r]|||2  -  <72|||C[c,  r]|||2) 
=  ((c,r),(Sc,Sr))M 


(23) 


where  <  ,  >m  is  the  scalar  product  in  model  space  corresponding  to  the  norm: 

\\[6c,  Sr] \\2M  =  ([5c,  <5r],  [Sc,  5r])M- 

(We  assume  that  model  space  is  a  Hilbert  space,  since  all  efficient  smooth  opti¬ 
mization  methods  are  predicated  on  this  assumption.) 

A  principal  requirement  ((i)  in  Section  3)  is  that  the  function  [c,  r]  — ► 
|||C[c,  r]|||2  is  regular,  i.e.,  differentiable.  Examining  (A2),  we  see  that  deriva¬ 
tives  of  Sr  are  involved  in  SC.  Since  r,6r  are  to  be  allowed  to  be  arbitrary 
(grid-representable)  functions,  SC  can  ony  be  continuously  dependent  on  Sr, 
as  is  required  by  regularity,  if  the  model  norm  includes  explicit  control  over 
derivatives  of  Sr.  The  obvious  choice  for  the  section  part  of  the  model  norm  is 

INI?  =  E?=oYlnklidtdPwt(j)wp(k)  {N*|2 

+\Dt6rjk\2  +  \DpSrjk\2} 

where  Dt  and  Dp  are  2-point  one-sided  difference  approximations  for  d/dt  and 
d/dp.  The  subscript  “1”  stands  for  “first  derivatives”;  this  is  the  discrete  version 
of  the  first  in  the  Sobolev  scale  of  norms,  a  basic  tool  in  modern  analysis  of 
partial  differential  equations. 

For  the  velocity  profile  part,  i.e.  Sc,  we  can  make  use  of  the  fact  that  C  is 
required  to  belong  to  a  space  of  smooth  splines. 

We  enforce  the  membership  of  c  in  the  spline  manifold  by  parameterizing  the 
model  in  terms  of  the  spline  coefficients  x,-  themselves,  rather  than  the  2-grid 
values  of  c.  Also,  we  tacitly  use  logc  and  its  perturbation,  Sc/c,  as  fundamental 
quantities  rather  than  c  and  Sc\  this  nondimensionalizes  that  part  of  the  model 
(note  that  r  is  already  non-dimensional,  by  definition).  Thus  we  need  to  express 
the  norm  of  Sc/c  in  terms  of  the  5x,-;  this  is  easily  accomplished  via  the  mass 
matrix 


This  is  computed  via  the  trapezoidal  rule,  of  course.  Then 

=  SxtM6x. 
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Note  that  we  are  measuring  rather  than  ^  itself.  Since  6c(0)  =  0,  a 
bound  on  the  former  implies  a  bound  on  the  latter,  but  the  choice  we’ve  made 
here  weights  the  more  oscillatory  velocity  perturbations  more  highly,  and  thus 
tends  to  rotate  the  gradient  in  the  direction  of  smoother  Sc’ s,  with  favorable 
computational  consequences. 

Now  regarding  the  model  as  the  pair  [z,r]  (rather  than  [c,  r]),  the  model 
norm  is  taken  to  be 


IIP*.  Sr]\\ m  =  nc6xT  M8x  +  tir\\Sr\\\. 

Adjustment  of  the  weights  r  allows  the  gradient  to  be  rotated  in  the  x-  or 
r-directions;  this  scaling-preconditioning  turns  out  to  be  important  in  achieving 
rapid  convergence. 


4.4  Gradients,  Hessians 


First  examine  the  incoherency  component  of  J,  as  before.  To  write  the  result 
in  a  revealing  way,  recall  that  SC  is  linear  in  [<5x,  6r],  and  write  6C  ■  [6x,  6r]  for 
the  value.  Then  the  limit  on  the  l.h.s.  of  (23)  can  be  carried  out  to  give 

2<T2{6C-[6x,6r),C{x,r])L>  = 
([Sx,Sr],2a2SC*C[x,r\)M 

where  the  adjoint  operator  SC *  is  defined  by  the  condition 

(6C[6x,6r},F)L>  =  ([6x,6r],6C'  ■  F)m 


(24) 

(25) 


which  is  to  hold  for  arbitrary  model  perturbations  [<5x,6r]  and  (<,p)-sections  F. 
Comparison  of  (23)  and  (24)  reveals  that 


grad((72|||C|||2)  =  [i.f] 

=  2tr2SC*  ■  C. 


(26a) 


Likewise,  the  Gauss-Newton  approximate  Hessian  operator  (Dennis  and  Schn¬ 
abel  (1983),  §10.2)  is  given  by 

Hess(<72|||C|||2)  •  [Sx,  $r]  =  2 cr2SC*  ■  SC  ■  [6x,  Sr).  (26b) 

Similar  formulas  hold  for  the  term  |||S—  S(ja^a |||2. 

The  calculations  (26)  are  the  principal  parts  of  the  quasi-Newton  methods 
to  be  introduced  below.  Thus  efficient  and  accurate  calculation  of  the  adjoints 
SC* ,  6S*  are  essential  to  successful  optimization. 
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4.5  Adjoints 

The  definition  (25)  of  SC*  must  be  taken  absolutely  seriously.  That  is,  even 
though  SC  is  given  (in  principle)  by  an  enormous  matrix,  SC*  is  not  simply 
the  operator  defined  by  the  matrix  transpose  of  SC.  SC*  is  related  to  the 
matrix  transpose,  however,  and  this  relation  provides  a  convenient  avenue  for 
computing  SC* . 

The  scalar  products  involved  in  (25)  may  be  written  symbolically  in  the  form 

(. X,Y)g=XtGY 

where  X  and  Y  are  parameter  vectors  and  G  is  the  Gram  matrix  of  (  ,  )q. 
Thus  G  is  a  positive-definite  symmetric  matrix  of  size  equal  to  the  dimension 
of  the  parameter  space. 

In  (25),  two  essentially  different  inner  products  are  involved,  on  two  different 
parameter  spaces  (models  [<5a;,  <5r],  sections  F),  as  well  as  a  linear  transformation 
(SC)  mapping  one  parameter  space  into  the  other.  Accordingly,  consider  two 
inner  products  of  the  form  given  above,  with  Gram  matrices  Gv,  v  =  1,2, 
and  a  linear  transformation  A,  given  by  a  matrix  of  appropriate  dimensions, 
mapping  one  parameter  space  into  the  other.  The  adjoint  of  A  is  defined  by  the 
abstraction  of  (25): 

(AXuX7)Gl  =  (X1}A'X2)g, 

for  arbitrary  Xv  in  the  Vth  parameter  space,  u  =  1,2.  Written  in  matrix  form, 

(AX\)tG\X2  =  X^G2A*X2 

from  which  it  is  clear  that,  as  matrices, 


A*  =  G?AtGi. 


(27) 


To  see  what  (27)  implies  for  (25),  write  SC  in  components: 

SC  =  [SXC,  6rC\. 


This  is  the  correct  matrix  representation  if  the  model  perturbation  is  written  as 
a  column  vector: 


Sx 


Sr 


SXC  ■  Sx  +  SrC  ■  Sr. 


Thus 


where 

(SXC  ■  Sx,  F)L7  =  SxT M(SXC*  ■  F)  (28a) 
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and 


(6rC-6r,F)L,  =  (6r,6rC*  ■  F)v  (28b) 

Identify  a  section  F  with  a  vector  in  any  convenient  fashion,  e.g.  by  listing 
the  traces  (columns)  sequentially.  Then  the  L2-inner  product,  discretized  by 
the  trapezoidal  rule,  is  realized  by  the  scaling  matrix  S  (which  scales  the  edge 
entries  by  1/2,  the  corner  entries  by  1/4,  and  everything  else  by  1): 

(Fi,Fi)l,  =  FfSFi 


for  any  sections  F\,  F?. 

Thus  (27)  applied  to  (28a)  gives 

6XC *  =  M~l6xCT  S.  (29a) 

For  (28b),  it  is  necessary  to  write  the  “Sobolev”  inner  product  (  ,  )i  in  the 
canonical  form  given  above;  its  Gram  matrix  turns  out  to  be  exactly  the  matrix 
of  the  discrete  Neumann  problem  for  the  usual  five-point  discretization  of  the 
Laplace  operator,  which  we  shall  denote  by  N.  Thus 

SrCm  =  N~l6rCT  S.  (29b) 

In  principle,  this  completes  the  calculation  of  the  adjoints,  hence  of  the  gradient 
and  Hessian.  In  practice,  inspection  of  (26a,  b)  shows  that  we  need  only  routines 
which  apply  6C*  to  a  section,  not  the  entire  matrix  of  6C*.  This  extremely 
important  observation  saves  much  computational  effort  and  storage.  In  fact, 
application  of  the  trapezoidal  scaling  operation  represented  by  S  is  trivial,  and 
the  transpose  operations  6XCT ,  6rCT  are  relatively  easy  to  work  out,  as  the 
same  sort  of  recurrence  rules  that  form  the  “forward”  calculations  of  SXC,  6rC. 
Thus  we  represent  (29a,  b)  in  the  alternate  forms:  for  an  arbitrary  section  F, 


ScC *  •  F  =  x 

Mi  =  SxC^SF 


(30a) 


irC*-F  =  r 

Nr  =  6rCTSF 

We  have  just  described  how  to  compute  the  right-hand  sides  of  the  second  equa¬ 
tions  in  each  of  these  pairs.  To  solve  the  linear  system  with  the  spline  mass  ma¬ 
trix  M,  we  used  a  standard  linear  equation  solver  (LINPACK:  SPOFA,  SPOSL), 
as  the  spline  space  is  small-dimensional — i.e.  the  background  model  has  few  de¬ 
grees  of  freedom,  <  20  in  all  of  our  experiments.  To  solve  the  discrete  Neumann 
problem,  which  is  quite  large  ( nt  =  300,  np  =  40  in  some  experiments),  we  took 
advantage  of  explicit  knowledge  of  the  discrete  Neumann  eigenfunctions  (tensor- 
product  cosines)  to  design  an  FFT-based  discrete  Neumann  solver,  which  solves 
the  second  member  of  (30b)  very  efficiently. 
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This  step  would  have  been  more  difficult  if  the  factor  p  had  been  included 
in  the  integral  defining  {  ,  )i,  as  it  should  be.  Then  N  would  include  a  p- 
difference  approximation  to  the  Bessel  operator  of  order  zero,  with  Neumann 
condition  at  p  =  pmax  (a  residue  of  the  cylindrical  geometry  implicit  in  the 
proper  definition  of  the  Radon  transform).  Thus  fast  Bessel  transform  software 
would  be  required. 

4.6  Interface  with  Optimizer 

Since  the  optimizer  described  below  accepts  standardized  n-vector  arguments,  it 
was  necessary  to  “bundle”  the  computations  just  described  into  procedures  with 
standard  calling  sequences.  We  used  a  pack/unpack  routine,  which  collapses  the 
spline/section  pair  [<5x,  6r]  into  a  vector  of  length  nspl  +  (nt  +  1)  *  np,  and  vice 
versa. 

4.7  Choice  of  Optimizer 

The  tests  reported  below  were  made  using  a  so-called  truncated  Newton  code. 
This  code  is  based  on  the  model  trust  region  principle  (Dennis  and  Schnabel 
(1983),  section  6.4)  and  on  the  extensions  to  it  introduced  by  Steihaug  in  his 
Yale  thesis  (Steihaug  (1981)).  Essentially,  the  Gauss-Newton  linear  step 

Hess  J  ■  [6x,  6r]  =  —grad  J 

is  solved  by  a  conjugate  residual  iteration  (Golub  and  Van  Loan  (1983),  Ch.  10), 
which  is  terminated  when  the  step  estimate  exits  a  ball  about  the  current  solu¬ 
tion  estimate,  the  radius  of  which  is  determined  by  a  simple  and  robust  updating 
strategy.  This  expedient  avoids  expensive  conjugate  residual  steps  taken  outside 
the  region  in  which  the  linearized  model  can  be  “trusted”,  hence  the  name.  A 
more  lengthy  description  of  the  code  can  be  found  in  Santosa  and  Symes  [1986], 
Chapter  9,  where  the  same  codes  were  used  in  solving  the  nonlinear  output 
least-squares  problem,  using  finite  difference  synthetic  seismograms  instead  of 
the  convolutional  model. 

An  important  amendment  of  the  trust  region  idea  is  natural  in  this  problem. 
All  models  in  the  iteration  are  supposed  to  have  a  fixed  rectangle  [0,<max]  x 
[0,PmaX]  as  a  precritical  set.  This  may  cease  to  be  true  during  an  update  step, 
if  the  velocity  is  increased  by  too  much  at  some  depth.  The  computation  of  the 
gradient  simply  flags  this  occurence,  and  the  algorithm  attempts  a  smaller  step 
in  the  same  direction.  Thus  the  trust  region,  for  problems  like  the  present  one, 
may  involve  in  a  natural  way  constraints  on  the  validity  of  the  model  itself. 
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4.8  Numerical  Experiments 


We  performed  a  number  of  numerical  experiments  using  the  velocity  profile 
c  (upper  curve)  and  perturbation  Ac  (lower  curve)  exhibited  in  Figure  6  to 
generate  the  p-tau  convolutional  model  data  of  Figure  8,  by  convolving  with  a 
Ricker  wavelet  with  center  frequency  20  Hz.  A  target  background  velocity  with  a 
lower  velocity  zone  was  chosen  because  the  structure  of  such  a  zone  is  impossible 
to  determine  from  refraction  arrival  times,  and  intrinsically  more  difficult  for 
least-squares  methods  —  see  Santosa  and  Symes  [1986].  The  velocity,  hence 
the  slowness,  were  normalized  against  the  surface  velocity,  by  changing  the 
measure  of  depth  to  normal-incidence  time  for  a  constant  background  velocity 
equal  to  the  surface  velocity.  This  step  also  normalizes  the  slowness  to  the 
range  0  <  p  <  1.  A  happy  side-effect  of  this  normalization  was  to  reduce 
substantially  the  numerical  imprecision  resulting  from  mis-scaling  inherent  in 
the  use  of  physical  units. 

The  algorithm  explained  in  the  preceding  subsections  was  used  to  extract 
estimates  of  c  and  r  from  the  data  of  Figure  8.  Parameters  common  to  all 
experiments  were 

<r=l,  ,  Pc  =  10~4,  nr  =  1. 

We  found  the  small  value  of  necessary  to  rotate  the  gradient  of  J  toward  the 
“c-”  direction. 

In  all  cases,  we  observed  the  same  pattern.  We  began  with  the  simple  esti¬ 
mate  c initio  =  const.  (=  1),  =  0.  The  first  Newton  step  did  not  change 

the  estimate  of  c,  since  the  incoherence  of  r  =  0  vanishes  for  any  velocity  model. 
Otherwise  put,  since  there  are  initially  no  reflectors,  there  is  initially  no  move- 
out  information  in  the  reflectivity  with  which  to  update  the  velocity  model.  The 
first  iteration  is  thus  devoted  entirely  to  minimizing 

ll!/*|-WIP 

which  amounts  to  deconvolving  the  data  in  a  least-squares  sense  to  find  an  initial 
(nontrivial)  estimate  for  r.  Unless  otherwise  noted,  each  Newton  step  (including 
the  first)  is  approximated  by  five  conjugate-residual  iterations. 

In  the  second  and  subsequent  iterations,  the  velocity  model  is  improved,  and 
data-noise-generated  incoherency  in  the  reflectivity  is  reduced. 

Figure  9  shows  the  velocity-estimate  results  of  five  and  ten  Newton  steps 
steps  from  both  the  constant  velocity  model  (curve  0)  and  five  Newton  steps 
from  the  velocity  model  identified  as  the  curve  1.  The  results  are  labeled  as 
curves  2,  3  and  4  respectively.  In  all  cases  we  used  8  spline  nodes  (thus  the 
velocity  is  determined  by  5  parameters).  This  result  is  evidence  for  the  inde¬ 
pendence  of  initial  estimate  of  the  final  estimates.  In  particular,  the  results  of 
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ten  iterations  from  a  constant  initial  model,  and  five  iterations  from  the  “in¬ 
correct  trend”  initial  model  (curve  1)  are  virtually  identical.  The  error  is  quite 
stable  and  is  due  to  the  fact  that  the  target  is  not  a  member  of  the  space  of 
velocities  defined  by  the  eight-node  spline  basis  —  i.e.  we  get  the  “closest” 
eight-node  velocity  estimate. 

These  estimates  are  reasonably  accurate,  especially  considering  the  compu¬ 
tational  work  required.  Apparently  the  incoherence  resulting  from  the  erroneous 
basement  velocity  was  insufficient  to  cause  further  corrections,  or  possibly  the 
calculation  of  the  incoherence  is  substantially  inaccurate  there  —  see  comments 
below.  Most  of  the  reduction  in  J  (about  an  order  of  magnitude)  came  in  the 
first  iteration,  in  which  the  data  is  deconvolved.  Yet  another  order  of  magnitude 
is  gained  in  the  remaining  iterations,  in  which  the  incoherence  is  reduced. 

Two  major  points  have  emerged  from  the  experimental  work  conducted  thus 
far.  The  first  concerns  the  number  of  spline  nodes:  the  outcome  is  in  some 
ways  quite  sensitive  to  this  number.  For  example,  the  experiment  of  Figure 
9  was  repeated  with  16  nodes  instead.  Five,  ten,  and  thirty  iterations  of  the 
Gauss-Newton  process  produced  the  curves  labeled  1,  2,  and  3  in  Figure  10. 
These  look  quite  “wild,”  and  certainly  the  mean-square  error  is  much  greater 
than  is  the  case  with  those  in  Figure  9.  Recall  however  that  the  principal  role 
of  the  background  model  is  to  supply  travel-times.  A  quite  different  picture 
emerges  when  the  travel-times  are  plotted  against  the  “true”  travel  time  curve 
(from  the  velocity  profile  of  Figure  6).  In  Figure  11  are  displayed  (normal- 
incidence)  travel-time  curves  from  the  “exact”  velocity,  curve  1  from  Figure  9, 
and  curves  1  and  2  from  Figure  10.  In  fact,  the  latter  two  curves  are  closer 
to  the  “true”  travel-time  than  is  the  former,  despite  their  correspondence  with 
velocity  estimates  having  larger  L2-error.  This  relation  emerges  more  clearly 
when  the  (normal  incidence)  travel-time  errors  are  plotted:  see  Figure  12. 

Apparently,  the  result  of  increasing  the  number  of  degrees  of  freedom  in  the 
model  is  to  allow  a  closer  fit  to  to  the  travel-time,  at  least  at  points  correspond¬ 
ing  to  major  reflectors,  but  at  the  cost  of  an  oscillatory  error  which  may  be  large 
in  the  mean-square  sense.  This  is  easy  to  understand:  the  errors  oscillate  on  a 
length  scale  too  short  to  affect  the  travel-times  between  major  reflectors,  hence 
correspond  to  small  eigenvalues  of  the  incoherence  Hessian.  While  the  effect 
on  travel-times  of  this  sort  of  error  is  a  priori  small,  it  does  produce  irritat¬ 
ing  ambiguities  in  the  velocity  estimate,  and  (more  important)  has  a  negative 
impact  on  the  convergence  of  the  iterative  scheme.  Several  approaches  to  the 
removal  of  this  ambiguity  suggest  themselves.  Trial-and-error  determination 
of  the  optimal  spacing  for  spline  nodes,  as  has  been  done  here,  is  clearly  un¬ 
satisfactory.  Systematically  increasing  the  number  of  nodes  until  a  good  fit  is 
obtained  requires  some  notion  of  an  “acceptable”  level  of  fit,  and  such  infor¬ 
mation  may  itself  only  be  obtainable  by  trial-and-error.  A  more  satisfactory 
approach  might  be  adaptive  estimation  of  small  Hessian  eigenvalues,  through 
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the  close  relation  of  conjugate-residual  iteration  with  the  Lanczos  algorithm 
(Golub  and  Van  Loan  [1983],  Ch.  10),  and  penalization  of  the  corresponding 
eigenvector  components.  For  the  application  of  such  “iterative  deflation”  to  lin¬ 
ear  systems  see  Chan  [1986]  (also  Meza  and  Symes  [1987]).  Since  the  number  of 
small  eigenvalues  associated  with  velocity  perturbations  is  small,  and  since  their 
characterization  is  somewhat  independent  of  the  current  velocity  estimate,  this 
deflation  strategy  should  work  rather  well  in  combination  with  Gauss-Newton 
iteration.  Finally,  since  the  culprit  is  oscillatory  error,  penalization  of  a  velocity 
derivative  may  regularize  this  problem  satisfactorily.  Computational  trials  are 
in  progress;  results  will  be  reported  elsewhere. 

A  second  point  concerned  the  density  of  p-samples.  Inspections  of  Figure  8 
clearly  show  that  for  large  p  and  t,  the  moveout  difference  in  neighboring  traces 
may  be  a  substantial  fraction  of  a  wavelength.  As  we  have  based  our  difference 
approximations  to  the  incoherency  on  centered  difference  approximations  to 
the  coordinate  derivatives  Jj,  the  possibility  exists  of  severe  undersampling 
in  p  .  In  fact  when  we  increased  A p  to  .02  (from  .01  as  in  Figure  8),  the 
computation  analogous  to  that  for  Figure  9  gave  completely  erroneous  results 
for  the  deeper  part  of  the  velocity  profile,  apparently  because  the  part  of  the 
incoherency  due  to  deeper  reflectors  was  grossly  underestimated.  We  suspect 
that  residual  inaccuracy  in  the  deeper  parts  of  the  curves  in  Figure  9  is  due  to 
a  milder  version  of  the  same  effect. 

Besides  finer  sampling,  methods  to  overcome  errors  in  incoherency  due  to 
undersampling  include  higher-order  difference  formulas  and  difference  formulas 
better  adapted  to  the  moveout.  Indeed,  low-order  differences  along  even  a  crude 
approximation  to  the  correct  moveout  curve  should  produce  better  results  at 
coarser  sampling  than  the  coordinate  derivatives  used  in  our  present  code.  These 
ideas  are  also  under  investigation. 

It  might  be  objected  that,  while  the  output  includes  an  estimate  of  the 
travel-time  reflectivity  section  r(f,p),  no  estimate  of  the  corresponding  velocity 
perturbation  Ac(z)  is  provided.  The  final  reflectivity  is  not  necessarily  entirely 
coherent,  and  so  does  not  correspond  to  any  Ac(z),  strictly  speaking.  Nonethe¬ 
less,  an  estimate  may  be  produced  by  stacking  r(t,p)  on  the  basis  of  equation 
(5),  i.e. 

Ac(z)  ~  — f  dp  v-2(z,p)r(r(z,p),p). 

Pmax  Jo 

This  output  might  truly  be  regarded  as  the  final  image  produced  by  an  iterative 
before-stack  migration,  specialized  to  constant-density  acoustics. 

We  have  implemented  this  post-inversion  stack  of  the  final  reflectivity  es¬ 
timate,  and  display  the  results  for  the  reflectivity  corresponding  to  curve  3  in 
Figure  9.  When  stacked  with  constant-velocity  moveout,  the  estimate  of  Ac(z) 
is  completely  erroneous  in  phase  and  wrong  by  a  factor  of  perhaps  3  in  ampli¬ 
tude  (Figure  13).  When  stacked  with  the  final  velocity  estimate  from  Figure  9 
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(curve  3),  the  estimated  Ac(z)  has  essentially  correct  phases  and  amplitudes 
for  the  major  events  (Figure  14),  as  compared  to  the  Ac(z)  actually  used  to 
generate  the  data.  Of  course,  none  of  the  subwavelength-scale  variation  in  the 
true  Ac(z)  could  appear  in  the  reflectivity  r(t,p)  or  in  the  stacked  estimate,  so 
a  more  interesting  comparison  is  the  p-tau  section  generated  using  the  velocity 
from  Figure  9  and  the  stacked  Ac(z)  from  Figure  14.  This  is  displayed  in  Figure 
15,  and  the  difference  of  Figure  15  and  Figure  8,  plotted  on  the  same  scale,  in 
Figure  16. 


4.9  Summary 

We  have  described  a  preliminary  implementation  of  the  coherency  optimiza¬ 
tion  method,  exhibited  the  results  achievable  with  this  rather  crude  code,  and 
suggested  some  issues  worthy  of  further  examination.  Given  the  rather  large  dis¬ 
cretization  errors  in  our  present  implementation,  which  play  the  role  (at  least) 
of  data  noise,  the  stability  of  the  final  estimates  and  the  rate  of  convergence 
both  appear  quite  satisfactory:  to  a  limited  extent,  we  appear  to  have  satisfied 
the  criteria  (a) — (c)  stated  in  Section  2. 

Extensive  noise  studies,  tuning,  modifications  along  the  lines  suggested  above, 
and  direct  comparison  with  output  least-squares  optimization  will  be  reported 
in  a  future  publication. 

5  Coherency  Optimization  for  Laterally  Heterogeneous 
Models 

The  essential  ingredients  of  the  approach  to  velocity  inversion  sketched  in  Sec¬ 
tion  2  were: 

(i)  parameterization  of  the  reflectivities  as  time  “sections”  (i.e.  traces), 
so  that  the  seismogram  is  a  regular  function  of  the  reflectivities,  one 
reflectivity  (trace)  per  plane-wave  “shot”  (synthetic  source); 

(ii)  referral  of  each  time-section  reflectivity  to  depth,  and  assessment  of 
the  dependence  of  the  resulting  suite  of  depth  sections  on  the  “shot” 
parameter  (i.e.  slowness). 

In  this  section,  we  maintain  the  fiction  that  the  seismogram  is  adequately 
approximated  by  the  multi-dimensional  version  of  the  convolutional  model,  i.e., 
the  linearization  about  a  smooth  background  velocity,  and  moreover  apply  high- 
frequency  asymptotics  freely.  A  natural  interpretation  of  (i)  and  (ii)  emerges  in 
this  context. 
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First  recall  why  the  parameterization  by  two-way  time  resulted  in  regular¬ 
ity  of  the  seismogram  as  a  function  of  both  the  background  velocity  and  the 
reflectivity.  In  effect,  each  reflector  was  associated  with  the  time  of  arrival, 
at  the  surface,  of  its  reflection.  If  the  background  velocity  is  changed,  then 
depth-parameterized  reflectors  remain  fixed  while  their  reflection  times  change, 
whereas  time-parameterized  reflectors  remain  fixed  while  their  depths  change. 
In  the  former  case,  high-frequency  arrivals  are  time-shifted,  while  in  the  lat¬ 
ter  case  they  are  not.  The  time-shift  is  a  time  derivative  in  the  infinitesimal 
limit,  and  its  appearance  marks  the  loss  of  regularity  of  the  depth-parameterized 
reflectivity-to-travel-time  map.  For  time-parameterized  reflectivities,  no  such 
time  shift  occurs  as  a  result  of  background  velocity  change,  and  regularity  is 
maintained. 

Reflectors  and  reflection  arrivals  are  both  (near)-singularities,  i.e.  locii  of 
high-frequency  energy.  In  the  high-frequency  limit,  therefore,  we  must  ask:  how 
can  we  parameterize  reflectivity  so  that  a  singularity  in  the  reparameterized 
reflectivity  corresponds,  under  conversion  to  ordinary  spatial  coordinates  and 
mapping  to  the  seismogram,  to  a  singularity  in  the  same  location?  In  several¬ 
dimensional  problems,  singularities  may  have  orientations,  and  these  must  be 
preserved  as  well. 

This  question  is  answered  by  a  theorem  of  Rakesh  (Symes  (1985),  Rakesh 
(1988)),  together  with  a  construction  presented  for  this  problem  by  Beylkin 
(1985).  To  fix  ideas,  suppose  we  consider  the  linearized  acoustic  problem  for  a 
point  source,  which  models  a  shot-gather: 

1  <92u  2  _  2 r  32«o 

c2  U~  ~c2  ~dtr 

where  r  =  A c/c  is  the  “reflectivity”,  c  is  the  background  velocity,  u  is  the 
scattered  field,  and  «o  is  the  reference  field  satisfying 

h  if  "  v2uo =  _ 

x ,  being  the  source  location.  Rakesh  showed  that  for  the  impulsive  case  (/(<)  = 
6(t))  a  singularity  in  r  at  the  subsurface  location  y,  across  a  surface  element  with 
normal  77,  corresponds  to  a  singularity  in  the  reflected  field  at  receiver  location 
Xj. ,  time  tr ,  only  if  there  exist 

-  an  incident  ray  7,  associated  with  the  reference  field,  emanating  from 
the  source-point  x,  at  t  =  0; 

-  a  reflected  ray  ~fr ,  passing  over  the  receiver  point  xr  at  time  tr 

so  that  7,-  and  yr  meet  at  the  reflector  point  y  at  some  intermediate  time, 
making  equal-angles  with  the  reflector  normal  77.  For  non-impulsive  but  broad- 
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band  time  signatures  /(<),  this  rule  governs  the  arrival  of  (primary-reflected) 
high-frequency  energy. 

This  is  exactly  the  usual  picture  of  the  reflection  process,  of  course.  Rakesh’s 
theorem,  which  justifies  this  picture  in  terms  of  the  wave  equation,  holds  in 
complete  generality,  so  long  as  the  background  velocity  c  is  smooth.  It  tells 
us  exactly  which  singularity  in  the  time  section  must  be  mapped  to  a  given 
singularity  in  the  depth  section  if  the  seismogram  is  to  return  such  singularities 
to  their  original  position  and  orientation.  Beylkin’s  construction,  on  the  other 
hand,  applies  only  when  no  caustics  exist  in  the  incident  field,  i.e.  each  depth 
point  y  is  joined  to  the  source  point  x^  by  a  unique  incident  ray.  Then  any 
mapping  having  the  singularity-moving  properties  just  described  must  differ 
only  by  a  (possibly  frequency-dependent)  amplitude  modulation  (technically,  a 
pseudodifferential  operator)  from  the  Kirchhoff  migration  formula 

r(y,x,)  =  Kr(y,Zs)  /31n 

=  f  d*r  ^(£3,  y, )?(*r, y.£r ),£,)•  1 

Here  r^x^y,  x,)  is  the  two-way  reflection  phase,  i.e.  the  time  from  x,  to  y  to 
x,.,  and  «(*,,£,,  y)  is  a  slowly- varying  amplitude  modulation. 

Thus:  the  reflectivity  time-sections  r(xr,  t,  x^)  will  be  converted  to  reflec¬ 
tivity  depth  sections  r(y,Xj)  via  a  Kirchhoff  migration  formula  like  (31).  This 
implicitly  defines  the  seismogram  as  a  function  of  f(xr,t,x,),  and  guarantees 
that  it  is  regular  as  a  function  of  c,  f.  Note  the  apparent  similarity  of  (31)  to  the 
travel-time  change-of-variables,  appropriate  in  the  layered  case  (formula  (A.l), 
for  instance). 

The  second  ingredient  (ii)  in  the  coherency  optimization  approach  is  ev¬ 
idently  the  condition  that  the  depth-parameterized  reflectivities  r(y,xJ)  are 
actually  independent  of  x,  (“Every  shot  sees  the  same  earth”).  Regarding 
the  source  locations  as  filling  up  a  continuum,  this  amounts  to  the  condition 
V£  x(y,x4)  =  0.  The  composition  rules  for  oscillatory  integrals  (the  local  cal¬ 
culus  of  Fourier  Integral  operators — e.g.  Duistermaat  (1975),  Ch.  2)  give  the 
result: 

V£tr  =  V£fRr  =  K(V£>  +  P)f 

up  to  an  error  rapidly  decaying  in  frequency  content,  where  P  is  a  pseudodif¬ 
ferential  operator  in  xr  and  t,  i.e.  an  oscillatory  integral  of  the  form 


Pf(Xj.,t)  =  J  J  dxds  dkdw  e'^  ^-'  x,  s,  k,w)r(x,  s). 


The  amplitude  p  depends  on  the  ray  geometry,  i.e.  on  the  background  velocity 
c,  but  the  phase  k(xr  —  x )  +  w(t  —  s)  does  not.  Thus  (V£>  +  P)f  is  regular  as 
a  function  of  c,  i.e.  perturbing  c  does  not  result  in  the  appearance  of  higher 
derivatives  of  r,  just  as  was  the  case  for  the  formula  (A.l)  for  C[c,  r]. 
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Now  let  L  be  any  operator  from  depth-parameterized  reflectivities  to  sections 
which  has  the  reverse  effect  to  K  on  singularities:  for  instance  L  might  be  taken 
as  the  linearized  seismogram  operator  itself.  Then  another  application  of  the 
same  reasoning  shows  that 


C[c,f]  :=  LVt'Kr 

is  regular  as  a  function  of  c,  f  for  the  same  reason.  We  take  this  formula  as 
our  definition  of  the  incoherency  for  the  laterally  heterogeneous  acoustic  prob¬ 
lem.  Note  that  L  must  be  computed  to  form  the  seismogram,  and  K  is  the 
Kirchhoff  migration  operator  (or  an  equivalent).  Thus  C[c,  r]  involves  only 
well-understood  computations.  Also,  the  astute  reader  will  note  the  immediate 
resemblance  to  the  definition  of  C[c,  r]  given  in  Section  2. 

We  conclude  that  both  main  ingredients  of  coherency  optimization,  as  ex¬ 
plained  in  Section  2  for  layered  acoustics,  generalize  in  an  acceptable  way  to  a 
simple  laterally  heterogeneous  model.  Many  details  remain  to  be  settled,  some 
of  which  will  doubtless  be  crucial  to  computational  efficiency.  Also,  the  anal¬ 
ysis  of  Section  3  remains  to  be  generalized.  A  technical  complication  is  that, 
while  the  operators  L  and  K  exist  in  general,  the  composition  rules  leading  to 
the  regularity  of  the  incoherency  C[c,  f]  must  be  modified  when  caustics  are 
present.  Nonetheless,  we  have  established  that  the  coherency  approach  is  not 
conceptually  restricted  to  layered  problems. 

To  end  this  section,  note  again  that  r  — *■  Kf  serves  the  role  of  the  travel-time 
transformation.  The  first  recognition  of  the  special  role  of  travel-time  in  regu¬ 
larizing  1-d  problems  was  probably  the  work  of  Gray  (1980),  as  noted  above. 
Some  time  later,  Gray  and  Hagin  (1984)  attempted  generalized  travel-time  coor¬ 
dinates  for  laterally  heterogeneous  point  source  problems,  with  limited  success; 
travel-time  (ray-straightening)  coordinates  per  se  do  not  exist  in  general  for 
several-dimensional  problems.  Nonetheless,  the  operator  K  accomplishes  the 
principal  effect  of  the  1-d  travel-time  coordinate,  i.e.  to  make  reflector  location 
independent  of  background  velocity  in  both  the  (reparameterized)  reflectivity 
and  in  the  seismogram  simultaneously.  Of  course,  K  is  a  more  complicated  op¬ 
erator  than  a  change  of  coordinates,  except  for  1-d  (plane-wave,  layered  model) 
problems. 


6  Discussion  and  Conclusion 

6.1  The  scope  of  the  coherency  approach 

From  a  theoretical  point  of  view,  and  perhaps  from  a  practical  one  as  well,  the 
chief  defect  in  the  results  of  Sections  2  and  3  on  the  layered  acoustic  problem  is 
the  absolute  lack  of  any  provision  for  multiple  reflections.  This  defect  is  cured 
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in  the  companion  paper  Symes  [1988],  in  which  the  fully  nonlinear  bandlim- 
ited  layered  velocity  inversion  problem  is  treated  with  full  mathematical  rigor. 
We  reach  a  qualitatively  identical  conclusion  about  the  appropriate  version  of 
coherency  optimization:  that  is,  it  gives  a  (regularized)  solution  of  the  inverse 
problem  stably  dependent  on  near-consistent  data,  provided  that  sufficiently 
many  reflectors  are  present,  i.e.  that  the  target  profile  is  sufficiently  rough.  We 
give  a  precise  sense  for  “rough” ,  and  a  relation  emerges  between  stability,  aper¬ 
ture,  bandlimits,  and  roughness  (reflector  density)  very  similar  to  that  explained 
in  Section  3. 

Both  convolutional  model  and  fully  nonlinear  versions  of  other  layered  in¬ 
verse  problems  should  succumb  to  the  same  approach.  We  mention  specifically 
nonconstant-density  acoustics,  the  “marine”  elastic  model  (with  sources  and 
receivers  in  an  overlying  fluid  layer),  and  either  of  these  with  the  source  time- 
dependence  and  directivity  also  regarded  as  unknowns.  Some  idea  of  the  novel 
features  of  these  problems  in  convolutional  approximation  may  be  gleaned  from 
Sacks  and  Symes  [1987]  and  Bube,  Lailly,  Sacks,  Santosa,  and  Symes  [1987].  No 
technical  obstacles  appear  to  lie  in  the  way  of  an  analogous  treatment  of  these 
problems. 

Note  that  the  density,  regarded  as  independent  of  velocities,  will  not  be 
recovered  with  trend,  in  any  of  these  problems,  as  density  trend  perturbations  do 
not  affect  ray  geometry.  This  gross  density  ambiguity  is  well-known  (Tarantola 
1986)  and  has  been  observed  in  output-least-squares  results  (Canadas  and  Kolb, 
1986). 

The  program  outlined  in  Section  5  appears  feasible.  On  the  other  hand, 
while  an  analogous  coherency  optimization  principle  can  be  formulated  for  fully 
nonlinear  laterally  heterogeneous  models,  its  analysis  will  require  fundamental 
advances  in  the  understanding  of  wave  propagation  in  rough  media. 

6.2  Conclusion 

We  have  presented  a  novel  approach  to  the  reflection  seismic  inverse  problem, 
which  has  its  roots  in  utterly  commonplace  concepts  in  seismic  data  processing. 
We  have  formulated  this  coherency  optimization  principle  precisely  for  the  con¬ 
volutional  approximation  to  the  layered  constant-density  acoustic  model,  and 
suggested  both  analytically  and  numerically  that  its  solution  yields  accurate 
and  stable  estimates  of  both  velocity  trends  and  reflectivities,  to  the  extent  that 
these  are  determined  by  the  precritical  plane-wave  data  set  used.  Our  anal¬ 
ysis  indicated  that  coherency  optimization  should  require  markedly  less  com¬ 
putational  effort  than  output- least-squares  inversion.  Finally,  we  formulated 
an  analogous  principle  for  laterally  heterogeneous  velocity  inversion,  a  problem 
for  which  output-least-squares  inversion  is  so  inefficient  as  to  be  infeasible.  It 
remains  to  be  seen  whether  coherency  optimization  yields  a  computationally 
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tractable  approach  to  such  several-dimensional  problems. 
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Appendix.  Computation  of  a  derivative 


This  appendix  details  the  calculation  of  the  derivative  of  the  incoherency 


C[c,r]  =  ^(d  2ror)]o( 

=  -2  pr  +  [u-2A(r  or) 


°C 


=  -2pr  +  i>  2°C  (g£  -pfjjJ®) 
in  which  have  been  used  the  identities 


A, -2  _ 

dp 


v  2  =  -2 p,  &  =  pw 


dr  _ 


Clearly 

with 


dp  =  “P/o  V- 

6C  =  6cC  +  6rC 


6rC  =  C[c,  6r], 

On  the  other  hand,  an  easy  calculation  shows  that 

r(  sc 


6  C 


*  r  6c 


(A.1) 


So  from  the  third  line  in  (A.l) 


6cC 


%"2°0  [^(ror)j  °C 
-py~2°C^fi  v 


-2[»  +  ^/o^]«C[4(ror) 
~P  Ho  v-2A6c  +  v~2f*  A<5c]  oC|f. 


°C 


An  alternate  form  of  (A. 2)  follows  from  the  identity 

6fov  -  s(fo(vo02) 

=  2fo(voOHvoO  =  2fo6(vo()  or. 


So,  using  the  notation 


(A.2) 


Q6c  =  6(vo()or 
=  A3(<5c+§f 


36 


the  first  line  in  (A. 2)  may  be  re-written 


6cC  =  -2  v  3Q5c^(ror)j  oC 
-2  Pv~2°C&tiQ6c. 


(A.2') 


For  section  3  we  require  an  evaluation  of  6cC  when  r  is  coherent,  i.e. 
satisfied.  Then  for  suitable  Ac, 


Moreover,  from  (A.l), 


so  (A.2')  becomes 


— (ror)oC  =  2p(v2  o  £)r 


6cC  =  -2p{2[Q6cv  x]o£-r+[»  2  f  QSc]  oCfr} 

=  -2 p  {2(QSc)vc-3Ac  +  (fQ6c)v-1£  ( A 2^)  o} 

=  -2p{2(Q6c)Ac~2Ac 

+(/ Q6c)  (2AVp2 (AS)  +  c-xA^(^)) }  o  C 
=  —  2p  {2  [(Q5c)Ac_1  -(-  (/  <56c)p2A3c']  ( A£) 

+  [(/Q«c)c-‘A]  £(^)}°C. 

For  our  computations,  we  used  the  formulas  (A.l)  and  (A.27)  with 


(A(ror))o( 


computed  as  in  the  last  line  of  (A.l). 


37 


References. 


BAMBERGER,  A.,  G.  CHAVENT,  C.  HEMON,  and  P.  LAILLY  [1982],  Inver¬ 
sion  of  normal  incidence  seismograms,  Geophysics  4 7,  pp.  757-770. 

BEYLKIN,  G.  [1985].  Imaging  of  discontinuities  in  the  inverse  scattering  prob¬ 
lem  by  inversion  of  a  causal  generalized  Radon  transform,  J.  Math.  Phys.  26, 
pp.  99-108. 

BUBE,  K.,  P.  LAILLY,  P.  SACKS,  F.  SANTOSA,  and  W.  SYMES  [1987],  Si¬ 
multaneous  determination  of  source  wavelet  and  velocity  profile  using  impulsive 
point-source  data  from  a  layered  fluid,  Geophys.  J.  Roy.  Astr.  Soc.,  to  appear. 

BUBE,  K.,  D.B.  JOVANOVICH,  R.T.  KANGAN,  J.R.  RESNICK,  R.T.  SHUEY, 
and  D.A.  SPINDLER  [1985].  Well-determined  and  poorly  determined  features 
in  seismic  reflection  tomography:  Part  II,  55th  Annual  Meeting,  SEG,  Wash¬ 
ington,  DC. 

BEYLKIN,  G.  and  R.  BURRIDGE  [1987].  Linearized  inverse  scattering  prob¬ 
lems  of  acoustics  and  elasticity,  preprint;  also  Multiparameter  inversion  for 
acoustic  and  elastic  media,  Expanded  Abstract,  57th  Annual  International  Meet¬ 
ing,  Society  of  Exploration  Geophysicists,  New  Orleans,  pp.  747-749. 

CANADAS,  G.  and  P.  KOLB  [1986]  Least-squares  inversion  of  prestack  data: 
simultaneous  identificatio  of  density  and  velocity  of  stratified  media,  Expanded 
abstract,  56th  Annual  International  Meeting,  Society  of  Exploration  Geophysi¬ 
cists,  Houston,  pp.  604-607. 

CHAN,  T.  [1985].  Deflated  Lanczos  procedures  for  solving  nearly  singular  lin¬ 
ear  systems,  research  report  YALEU/DCS/RR-403,  Department  of  Computer 
Science,  Yale  University. 

CLAYTON,  R.  and  R.  STOLT  [1981],  A  Born-WKBJ  inversion  method  for 
acoustic  reflection  data,  Geophysics,  pp.  1559-1567. 

COHEN,  J.K.  and  N.  BLEISTEIN  [1979].  Velocity  inversion  procedure  for 
acoustic  waves,  Geophysics  44 ,  PP-  1077-1085. 

DENNIS,  J.  E.  JR.  and  R.  B.  SCHNABEL  [1983].  Numerical  Methods  for  Un¬ 
constrained  Optimization  and  Nonlinear  Equations,  Prentice-Hall,  Englewood 
Cliffs. 

DUISTERMAAT,  J.  [1975].  Fourier  Integral  Operators,  Courant  Institute  Lec¬ 
ture  Notes. 


37 


FOWLER,  P.  [1986]  Migration  velocity  analysis  by  optimization:  linear  theory, 
Expanded  Abstract,  56th  Annual  International  Meeting,  Society  of  Exploration 
Geophysicists,  Houston,  pp.  660-662. 

GAUTHIER,  O. ,  TARANTOLA,  A.  and  VIRIEUX,  J.  [1986].  Two-dimensional 
nonlinear  inversion  of  seismic  waveforms,  Geophysics  51,  pp.  1387-1403. 

GOLUB,  G.  and  C.  VAN  LOAN  [1983].  Matrix  Computations,  The  Johns 
Hopkins  University  Press,  Baltimore. 

GRAY,  S.H.  [1980].  A  second-order  procedure  for  one-dimensional  velocity  in¬ 
version,  SIAM  J.  Appl.  Math.  39,  pp.  456-462. 

GRAY,  S.  and  F.  HAGIN  [1984],  Travel-time-like  variables  and  the  solution  of 
velocity  inverse  problems,  preprint. 

HADJEE,  Y.  and  F.  COLLINO  [1988].  A  geometrical  approach  to  the  a-priori 
study  of  the  1-d  inverse  problem,  IFP  preprint. 

IKELLE,  L.  ,  J.  DIET  and  A.  TARANTOLA  [1988].  Linearized  inversion  of 
multioffset  seismic  data  in  the  omega-k  domain:  depth-dependent  reference 
medium,  Geophysics  53,  pp.  50-64. 

KLEYN,  A.  H.  [1983].  Seismic  Reflection  Interpretation,  Applied  Science  Pub¬ 
lishers,  New  York. 

KOLB  P.,  F.  COLLINO,  and  P.  LAILLY  [1986].  Prestack  inversion  of  a  ID 
medium,  Proc.  IEEE  7 4,  pp.498-506. 

LAILLY,  P.  [1984].  Migration  methods:  partial  but  efficient  solutions  to  the 
seismic  inverse  problem,  in  Inverse  Problems  of  Acoustic  and  Elastic  Waves,  ed. 
Santosa  et  al.,  SIAM,  Philadelphia. 

LINES,  L.  and  S.  TREITEL  [1984],  A  review  of  least-squares  inversion  and  its 
application  to  geophysical  problems,  Geophysical  Prospecting  32,  pp.  159-186. 

LINES,  L.,  J.  SCALES  and  S.  TREITEL  [1987]  (preprint  on  seismic  tomogra¬ 
phy,  to  appear  in  J.  Comp.  Phys.) 

LINES,  L.,  A.  SCHULTZ  and  S.  TREITEL  [1988].  Coperative  inversion  of 
geophysical  data,  Geophysics  53,  pp.  8-20. 

MCAULAY  [1985].  Prestack  inversion  with  plane-layer  point  source  modeling, 
Geophysics  50,  pp.  77-89. 

MEZA,  J.  and  W.  SYMES  [1987].  Deflated  Krylov  subspace  methods  for  nearly 
singular  linear  systems,  Technical  Reprot  87-3,  Department  of  Mathematical 
Sciences,  Rice  University. 


38 


MORA,  P.  [1986],  Nonlinear  2-d  elastic  inversion  of  multi-offset  seismic  data, 
Expanded  abstract,  56th  Annual  International  Meeting,  Society  of  Exploration 
Geophysicists,  Houston,  pp.  533-537;  also  Geophysics  52. 

MORA,  P.  [1987],  Nonlinear  2-d  elastic  inversion  of  real  data,  Expanded  ab¬ 
stract,  57th  Annual  International  Meeting,  Society  of  Exploration  Geophysicists 
New  Orleans,  pp.  430-432. 

RAKESH  [1988].  A  linearized  inverse  problem  for  the  wave  equation  Comm 
on  P.  D.  E.  13,  pp.  573-601. 

SACKS,  P.  and  W.  SYMES  [1987],  Recovery  of  the  elastic  parameters  of  a 
layered  half-space,  Geophys.  J.  Roy.  Astr.  Soc.  88,  pp.  593-620. 

SANTOSA,  F.  and  W.  SYMES  [1986],  Least-squares  principles  for  layered 
velocity  inversion,  to  appear  in  Society  of  Exploration  Geophysicists  Monograph 
Series  (1988-89). 

SANTOSA,  F.  and  W.  SYMES  [1987].  Bandlimited  Velocity  Inversion,  Ex¬ 
panded  Abstract,  57th  Annual  International  Meeting,  Society  of  Exploration 
Geophysicists,  New  Orleans,  pp.  437-439. 

SPRATT,  S.  [1987].  Effect  of  normal  moveout  errors  on  amplitude  versus  offset- 
derived  shear  reflectivity,  Expanded  Abstract,  57th  Annual  International  Meet¬ 
ing,  Society  of  Exploration  Geophysicists,  New  Orleans,  pp. 634-637. 

STEIHAUG,  T.  [1981].  Quasi-Newton  methods  for  large-scale  nonlinear  prob¬ 
lems,  Ph.D.  Thesis,  Yale  University. 

SYMES,  W.  [1985].  Stability  and  instability  results  for  inverse  problems  in 
several-dimensional  wave  propagation,  Proc.  of,Seventh  Internat.  Conf.  on 
Computing  Methods  in  Applied  Science  and  Engineering  (INRIA). 

SYMES,  W.  [1988].  Bandlimited  Velocity  inversion:  a  model  inverse  problem 
from  reflection  seismology,  preprint. 

TARANTOLA,  A.  and  B.  VALLETTE  [1982],  Inverse  problems:  quest  for 
information,  J.  Geophys.  50,  pp.  159-170. 

TARANTOLA,  A.  [1984],  The  seismic  reflection  inverse  problem,  in  Inverse 
Problems  of  Acoustic  and  Elastic  Waves,  ed.  Santosaet  al.,  SIAM,  Philadelphia. 

TARANTOLA,  A.  [1986],  A  strategy  for  nonlinear  elastic  inversion  of  seismic 
reflection  data,  Geophysics  51,  pp.  1893- 1903. 


39 


FIGURE  CAPTIONS 


1.  Smooth  velocity  model  (upper  curve)  and  perturbation  (lower  curve): 
the  latter  is  non-zero  only  where  the  former  is  constant  (this  is  the 
situation  analysed  in  section  3).  Horizontal  scale  is  one-way  time  at 
surface  velocity. 

2.  The  coefficient  A  (display  (18)),  normalized  to  remove  velocity  depen¬ 
dence. 


3.  The  coefficient  Aj  (display  (19)),  normalized  to  remove  velocity  depen¬ 
dence. 


4.  (a)  cubic  b-splines:  5th  (solid)  and  7th  (dashed)  of  12  nodes  in  interval; 
(b)  indefinite  integrals  of  splines  in  (a). 

5.  Differences:  6th  -  5th  (solid),  7th  -  6th  (dashed)  spline  integrals. 

6.  Smooth  velocity  model  (upper  curve)  and  velocity  perturbation  (lower 
curve)  used  to  generate  data  for  coherency  optimization  experiments. 

7.  Eigenvalues  pPmax ,  p^\n  ,  and  fimin  as  in  displays  (20)  and  (21). 

8.  Plane  wave  section  generated  from  model  displayed  in  Figure  6,  using 
a  Ricker  wavelet  peaked  at  20  Hz. 
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9.  (0)  target  model,  from  Figure  6  (solid  curve); 

(1)  a  non-constant  initial  model  (short  dashes); 

(2)  result  of  5  Gauss-Newton  steps  starting  at  constant  model 
(long  dashes); 

(3)  result  of  10  Gauss-Newton  steps  starting  at  constant  model 
(dash/cross); 

(4)  result  of  5  Gauss-Newton  steps  starting  at  model  (1)  (short/long 
dashes). 

10.  Results  using  too  many  degrees  of  freedom  in  model  (16  nodes): 

(0)  target  model  (solid  curve); 

(1)  Gauss-Newton  steps  (short  dashes); 

(2)  10  Gauss-Newton  steps  (long  dashes); 

(3)  30  Gauss-Newton  steps  (short/long  dashes) 


11.  Two-way  travel  time  curves  at  normal  incidence: 

(0)  target  model  (solid  curve); 

(1)  curve  2,  Figure  9  (short  dashes); 

(2)  curve  1,  Figure  10  (long  dashes); 

(3)  curve  2,  Figure  10  (short/long  dashes). 

12.  Two-way  travel-time  errors: 

(0)  between  curves  (0)  and  (1),  Figure  11  (solid  line); 

(1)  between  curves  (0)  and  (2),  Figure  11  (short  dashes); 

(2)  between  curves  (0)  and  (3),  Figure  11  (long  dashes). 
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13.  Stack  (solid  line)  of  reflectivity  estimate  corresponding  to  Figure  9, 
curve  2,  at  constant  velocity,  compared  with  target  velocity  perturba¬ 
tion  (dashed  line). 


14.  Stack  (solid  line)  of  reflectivity  estimate  corresponding  to  Figure  9, 
curve  2,  at  velocity  given  by  Figure  9,  curve  2,  compared  with  target 
velocity  perturbation  (dashed  line). 


15.  “Synthetic”  section  produced  using  velocity  from  Figure  9,  curve  2, 
with  stacked  velocity  perturbation  estimate  from  Figure  14. 


16.  Difference  of  Figures  15  and  8. 
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